 I guess we didn't establish a video in the intro, so maybe I'll do it. So welcome, everybody. This is one of our weekly-ish tech talk, and we're here today with a couple of guests from Facebook to talk about the hack programming language, and I will leave it to you, Josh, to tell us a little bit more about your role at Facebook and everything you're doing there. Well, thanks. So yeah, I'm a software engineer at Facebook. I currently work on the hack team, mostly doing open-source stuff but working on the hack type system in particular, and so hack is Facebook's dialect of PHP for those of you who don't know, and I'm here to talk about hack, an evolution of PHP, and in particular I've subtitled this talk continuing to take PHP seriously in a lot of ways. It is an evolution of a talk given by Keith Adams at Strangelood a couple of years ago, talking about the good points of PHP and actually taking the language seriously, and then hack and taking the language even further than that. So this talk is going to be divided into basically four parts. First I'm going to start by getting some background on PHP for those who aren't familiar. Then I'm going to introduce hack and talk about some of the ways hack fixes problems with PHP while taking a lot of the good things away from the language and keeping those. I'm going to talk about a couple of killer features of hack, in particular its static type system and async functions, and then I'm going to go through some of the bonus features of a language, much of smaller things that are also really useful. So PHP. PHP is a language that people love to hate, but it turns out that a lot of engineers are really productive in PHP. It has a bunch of nice features that at least I don't see in most, if any other programming languages. In particular, it has a really fast edit refresh cycle that allows for fast developer iteration. You can go write some code, save it in your editor, go to your browser, press refresh and see your changes immediately. There's no lengthy compilation stuff. There's no type checking stuff. Nothing like that. Edit refresh changes happen immediately. The fast feedback loop there at least has served Facebook really, really well to make developers really productive, to see changes like that and iterate quickly. And I think that's a really good thing of the language in general. It also has a really nice conceptual model, the request model. When you start a request, you start with a blank slate, you go maybe get some stuff out of Git and post params, fetch from database or memcache, do some computations, then at the end of the request, everything that you have done that you haven't either explicitly persisted to a database or something like that is all thrown away. There's no global state, no anything like that that you don't explicitly read or explicitly persist. Basically, the web server is stateless. This is a really nice mental model to work with because you don't have to worry about possible global state polluting between requests. Any sort of locking or mutexes or concurrency, no potential problems can come up like that. The language model completely disallows that. That said, PHP is a language that people love to hate. In my mind, this is a pretty iconic image, the double clawed PHP hammer, which a guy named Ian Baker built and photographed based off of a rant that someone else wrote called PHP, a fractal of that design. And this guy actually wrote for us. That is good to know. Sorry, I realized I forgot to say I would like to hold questions or other comments until the end. In any case, double clawed hammer for PHP from a rant called fractal of that design. People really love to hate PHP, love to write rants about how awful the language is. And at least some of that hate is warranted. So here's one of my favorite examples. Here's some code in PHP on the left. You can see that there's this first and second here, I'll use the laser pointer for a random of people in the room. This is an array literal with two elements, first and second, two string literals. Then we have this list assignment syntax, which is sort of like the primitive pattern matching that PHP has. And then we're going to bar down, in other words, print out the contents of the variables A and the variable B. Is this going to print? It's going to print two strings, first and second. This is basically the definition of the list assignment syntax. Take an array and you assign out the two elements in the array and print them, and then so we have this. So then what happens if you do that to something that isn't an array? What happens if you do it to a string? In some languages, strings are arrays of characters, in which case you would expect this to print A, or to print HA, the first two letters of the string had fish. In other languages, strings aren't arrays of characters, they're buckets of bytes, or you can model them various other ways. And so you wouldn't expect this to work at all. Maybe it would throw an error, or it would do something like that. So what does this do in PHP? Well, both variables get to be null, and so it prints two nulls. So I guess strings aren't really arrays of characters in PHP. So then what does this keep code do? If I assign it to a variable first, well, I guess strings are arrays of characters here now. So yeah, this is the sort of thing that PHP loves to do. I am absolutely picking on specific corner cases in order to showcase some of the absurdities of the language. Most of the language isn't like this, but there are corner cases like this, like really bizarre behavior in a lot of places. Enough that individually, like this is not a big deal. Basically, just don't do this. But these corner cases come up enough and enough that in aggregate, they start costing you time, both individually and as a developer or organization, as an engineering team, things like this add up, and add up, and add up, and cause bugs in a generally waste time. So that's some of the good parts and some of the ugly parts of PHP, which brings me to HAC. So HAC, I've been complaining about PHP and talking about that in terms of HAC. What is HAC? HAC is Facebook's dialect of PHP. It is PHP with some of the rough edges sanded off and oodles more features. In particular, it has a strong static type system that helps sand off some of those rough edges. It has first class support for async functions, which help the programmer express ways to match together IO. And it also has a bunch of other goodies that I will be talking about later. So what is HAC? Here on the left, we have a little bit of PHP. It opens with a typical opening tag, less than question mark PHP. I define a function increment, which takes a parameter x, returns x plus 1, and a function f, which calls an increment with 42, echoes that back out. So if I were to call this function, it would print 43. This is PHP. You're mostly called familiar with PHP. If you're not, it is like most other scripting languages. On the right, here is the equivalent HAC code. So the opening tag has changed from question mark PHP to question mark HH, to indicate that this has happened instead of PHP. But other than that, this function doesn't really look that different. No, excuse me. This function doesn't really look that different. It still takes a parameter x, returns x plus 1. Most of the syntax here is the same. The only other changes that I've added types in a few places here, that the input of this function increment is typed to take an int, and it returns an int, and the function f doesn't return a value, returns void. So as it turns out, the types that I've added here in this HAC code on the right are just for illustration purposes. The language is gradually typed, meaning the typing is all optional. And so you could have left all of those types off. And it would still be valid HAC. I just did this for illustration purposes, so they didn't have code that was literally identical on the left and the right side. HAC is very much in the spirit of PHP, it's a dielectric PHP, and it looks an awful lot like PHP, which brings me to one of the guiding ideas of HAC. HAC is all about developer efficiency. It's about making things better without switching to a totally new language or relearning everything. If you know PHP, you know HAC. It interoperates seamlessly with the existing PHP code. You don't have to go write it all, rewrite it all. You can go convert some corner of the code base if it makes sense, use new features as that makes sense, write some small project in it if that makes sense to interoperate with the existing code. But it's more than that. It is also an evolution of PHP that provides new features such as asynchronous IO, such as a real strong static type system to help prevent bugs, to move beyond some of the common pitfalls and lots of the ways that people like to make fun of the PHP language, and actually make something that takes that base and makes it work even better and then add on top of that. So really quickly, what does the HAC workflow look like? It's not all that different from the PHP workflow. You're going to write some code. You're going to save it in your editor. However, at this point, the HAC type checker will get invoked, which will instantly provide you type errors in your editor within about 100 millisecond response time. We value the speed of the type checker to make sure that we keep that fast, edit refresh cycle that I talked about with PHP. So now you have an even faster feedback loop. The second you save your code, instead of even having to tab to the browser and refresh and see if you've made some silly syntax error or some type error or some silly typo like that, and get a blank page and have to go to the error logs and things like that, you can get basic set of errors directly in your editor with an instant response time, which can save you then the silly bugs so that you can go look at the more complicated things when you then tab to your browser and continue the normal, fast iteration cycle of PHP. So at this point, most presentations will go and talk about the type system of HAC. A lot of folks think of HAC as PHP plus types. And so naturally, the next thing you talk about is the type system, how we use that to sand off some of the rough edges of PHP. But HAC is much more than just PHP plus types. So I'm actually not going to talk about the static type system quite yet. I'm going to talk about what I think is one of the major killer features of HAC, in particular, for Wikipedia, which is async functions. So async functions. Wikipedia is already running on HHVM. Whenever your code is actually running on the CPU, you're getting the speed benefits of HHVM in that respect. However, the latency of requests is not all about burning CPU. A lot of the actual wall time of requests is IO-Way as well. And so async functions are designed to help deal with IO-Way and make that faster in the spirit of HAC, without having to dramatically change the way you think about your code. Maybe a little bit of minor destruction refactoring, but it's not a massive change to the way you currently write, think about model code, and that sort of thing. This is really a killer feature for Facebook. When I recently heard a stat that when you go to the home page of Facebook and load your newsfeed, you will execute about 1 million async functions over the course of doing that. They're super heavily used to Facebook. It's how we make Facebook as fast as it is. It may nudge all of our complicated IO. So async functions. What do they look like in HAC? Here's an example of a simple async function. This is going to call this curl exact function, which will load the graphing point from Mark Zuckerberg and get his data, JSON decoded, and return. So what do you first notice about this function? There are two new keywords here, async and away. The async keyword up here is a modifier to a function declaration. It says this is an async function. This function might be suspended if it ends up blocking on IO. And the runtime might go off and run some other function if this one gets blocked, and the runtime is allowed to do that. Then there's this await keyword. Await keyword means take this other async function here. It's going to go off and do something. Execute it until it blocks. And if it blocks, then go off and run some other async function that is waiting to run, or may itself be blocked and data has come back and resume that. When you see this await statement, that's where the runtime is allowed to suspend execution of an async function if it gets blocked on IO in the course of running this curl exact. So just a single await like this, we're basically going to wrap it around the curl exact here, an async function awaiting another async function that goes off and loads a graphing point. This isn't really doing a whole lot. You don't see all the benefits of batching and all of that with this simple function. So let's look at a more complicated example. Here's another function, which I'll be getting cleverly called getData. This function, though, is going to await on all of these three curl execs all at once. We're going to use this helper function in the standard library called hhasiov that stands for hack asynchronous IO vector. It takes a vector of objects. Vector is another feature of hack. It's a hack collection. I won't go into details about collections now. You can think of this as array if you want for now. It's going to call these three curl exact functions. Get their results back into this $v variable, and then do something with them in this case in code. This code is going to run all three curl execs all in parallel. So the first one takes time a, and the second one time b, and the third one time c. With synchronous code, this would take time a plus b plus c to execute. But just like this, with asynchronous IO, with the asynchronous features of hack, this will take max of abc to run, because as soon as we get blocked making the first curl request, we'll go fire off the second. When that one blocks, we'll go fire off the third. And then the runtime will deal with the coordination of resuming you when all three are done, and you're signing it back into this vector of $v for you. So you'll be thinking, if you're really familiar with PHP, PHP has that function, something like curl multi-exact or something like that, for quite a while. You can give it a list of URLs to do, and it will basically do justice for you. So this isn't really showcasing the power of basing functions, and why it matters to have it as a first class feature in the language. And here's an example that will showcase that. Let me go to another, cleverly, data function. Again, this data function is going to await on a vector of things. But because the runtime is doing all the coordination for you, it can be a completely heterogeneous set of things. So the first thing I might do is, again, I want fetchworks, upper-buries, wrap, wrap, and point-json. But I also want to go to some DB, make some MySQL query, and fetch that data as well. And if that blocks, or the core request blocks, I want to go to the DB, and when all of that is blocked, I want to go off, and I know Wikimedia software uses a bunch of fluid extensions to compute things, which, as far as the hack runtime is concerned, you can think of it as sort of IO-ish. They can run in the background in a separate thread, and they don't interact with the rest of the runtime. So that's sort of like doing IO. They can sort of like RPC on the same machine, whatever. But you can absolutely model that as asynchronous IO. It's not that complicated. And so while we're waiting on this data to come back from the database, we can go compute stuff out of this Lua extension, and we can execute all of these three things all at once, and let the runtime deal with the coordination of what finishes when, how exactly we thread things through, et cetera, et cetera, et cetera. We can deal with a very heterogeneous set of IO here, because the runtime's coordinating it all for us. More interestingly, let's actually look at the definition of the spectrum-dv function. This spectrum-dv function is probably some other function that you wrote yourself. It's going to be another async function where I await on a connection to localhost, and then run a query, select star from whatever word marks up your word or something like that. It doesn't really matter what the query is, but you can write async functions in terms of other async functions, and build up these small reusable units, and build up more complicated queries, an async query on top of that. And because the runtime is coordinating all of it for you, it'll deal with suspending things when they're blocked, resuming them when they're not, and data is available, resuming other things when other things are blocked, et cetera, et cetera, et cetera. Again, the runtime coordinates all of this for you. So I've been talking a lot about things running concurrently and running in the background. One of the major things that I said was nice about PHP is we have this linear request model, no global state shared between things, no locks, and we text as anything like that. So how does async functions keep that nice thing about PHP? The answer that I've sort of alluded to as I've described this is that it is cooperative multitasking. Async functions in hack are cooperative multitasking. That means that your main body of hack code is still single-threaded. So there's no mental model for anything like that. It's cooperative in that your code might get suspended whenever you have an await statement, and some other asynchronous function might begin to execute there. But you know when that's happening, whenever you wrote in a wait statement. So there's no preemption, you don't need sort of walking or anything like that, no worry about critical sections. Of course there is all sorts of threading and walking and nastiness like that happening behind the scenes in the runtime, but the programmer sitting on top writing in hack has to worry about none of that. The programmer thinks in terms of async functions, in terms of building up computations, in linear data dependencies of async functions, awaiting another async functions, awaiting unbatches of async functions. No mutexes, no locking. All of that is deep in the balance of the runtime. As an implementation detail, hit and wave the programmer. You don't have to think about anything like that. So the idea of asynchronous IO is not new to hack, of course. As a matter of fact, the syntax and semantics in hack are very much inspired by C-sharp, which has an extremely similar mechanism. But there are other popular web programming languages, which do asynchronous IO. Node.js is a really good example of one of those. It's a popular language for web development. And it does asynchronous IO. I think most things in Node.js or asynchronous are easy to make, if they're not. But it doesn't model asynchronous IO quite the same way that hack does. It models them with callbacks. You can do all sorts of sugar on top of it with neat things like promises and things like that. But fundamentally, you're still writing code in terms of callbacks or in terms of do this thing and then call this anonymous function. Resume execution here. So here's a typical example of callback hell. I'm sure everyone's seen this shape of code here. Even if you don't quite get your code into this pyramid shape, like I said, you're still writing code in the callback style with do this thing that at some point later call this other function that is the rest of computation. That's called continuation passing style, do a computation that called this continuation. And that's not how I think about writing code, basically ever. It's the sort of thing that in your programming to a one class professor might make you write code in CPS, it's the sort of thing that compilers do as a transform. Machines can do this for us. I don't ever like thinking about code in terms of CPS. And there's no reason that most programmers would have to write code in terms of CPS if they don't want to. The beauty of the Async functions in HAC is that you can keep thinking about code in terms of linear execution, even if it's not really linear execution. But the runtime manages all of that, building up continuations, suspending things, resuming the continuation later, 40. You don't have to transform your code beyond saying, await this, await that. The runtime manages all of it for you. It turns out that the ECMAScript committee actually thinks this is a good idea, too, and I believe the next version of ECMAScript is going to get Async functions and presumably no will pick that up at some point. But you can get that in HAC right now with no new versions of ECMAScript, no transpilers or anything like that. We have it as a first-class facility right now to continue writing code the way you're thinking about it, the way you're used to thinking about it, while still getting the benefits of data fetching and asynchronous on it. So as a last example, I want to go walk through something a little bit more complicated. Let's suppose that we are building a local social network. And like any good social network, we're going to have pieces of content. We're going to have users. And those pieces of content are going to have privacy associated with them. And so let's look at what a simple privacy-checking function might look like. We're going to build up this Async function. It's going to be using data fetching. It's going to be called canC. It's going to take a user and a piece of content and determine if the user can see that piece of content. On our social network, let's just suppose that we have a very simple model of privacy. There are two privacy settings that the piece of content could have. It could be visible to friends of the author, or friends of friends of the author. And so let's look at how we might write that. So the way I think about it, everything that I just described is in terms of friends and friends of friends of people and who's friends and who can see things. So the first thing I'm probably going to want to do is take the content, get its author, and get the friends of that author. Now if I have this piece of content, I could probably get the author for free. Presumably, I have that in memory right now. But getting the friends of that author is probably going to require some sort of database query, like select UID, where a friend of people. I don't know quite what it looks like. But this is going to require some sort of data fetching. And so we're going to wait on it. And so we're going to do data fetching for the friends, or wait on it, and assign it in to this $ath variable. Now what we need to do now depends on what the privacy of the content is. So if this, we're going to switch on the privacy. If it is friend privacy, then we're basically done. We're going to take this friend set, see if it contains the user ID. If so, then this user is a friend of the author of a piece of content, and we return it to our false there. And we're done with the data fetching. However, let's suppose that it is friend of friend privacy. We're going to do a little bit more data fetching here. And so that's just exactly what we write. We already have the friends of the author. So let's get the friends of the user. And to think about this for a little bit, you can determine if you're friend of friend by just intersecting the friends of the author and the friends of the user. And so that's exactly what we do. We just fetch the friends of the user, intersect them, take a count of that, see if it's greater than zero. And if it is, then you can see it. Otherwise, you don't have any friends in common. And friends in common, and you fail the friend-to-friend test. So again, this is basically code for something fairly complicated, data fetching-wise that just reads like I would think about writing it in terms of the synchronous code. Fetch the friends, look at the privacy, see what I'm going to do, and then do that. Fetching more data, it might need it. There's no dealing with callbacks here, no anonymous functions, nothing like that, just straight linear code. But it still has the benefits of being able to run in parallel with something else that's fetching data. You're writing asynchronous code, asynchronous IO, in the way that you are used to thinking about that code. So I talked about one of the major features of hack, asynchronous IO, and now I'm talking about coming back to static types, the features that everybody loves to think about for hack, because it's certainly very important. So I said earlier that a hack has a real, strong, static type system. What does that mean? First of all, it's a static type system. You can get type errors before you run your code directly in your editor immediately. You can catch bugs, while the information about the code that you just wrote is still paging to your head. It's also a real type system. In hack strict mode, if all of your code is in hack strict mode, then if the type checker accepts it, then you can't get any type errors at runtime. So it is a real, strong, static type system. So let's go through some of the features of the type system and see what it looks like. So first, I'm going to write a function in hack called even age. I'm going to take a user and return to a false is the age of this user, even. Kind of a silly function, but it serves the example. So the first thing we're going to see here is this even age function is going to take a user $u, but I have this question mark out in front of the user type. PHP has object types, like user. And object types in PHP are not nullable. However, they're not nullable unless you say that it has a default value of null. In hack, we've separated out default values from nullability. And this question mark is how we indicate nullability. So this is a potentially null user here because of the question mark. Then this even age has a return type. PHP 5 doesn't have return types, but the RFC to add them was accepted for PHP 7. So this works as like the same in hack as it will in PHP 7. And this function is going to return a Boolean. Hack has scalar types, such as bool, int, string, float, that sort of thing. They are strictly enforced. PHP, even PHP 7, does not have scalar types. They are currently debating whether to add them. Pretty good it's been following the discussion on the mailing list. There's quite a vigorous debate about how this should work, but hack has them, hack has them right now. And they are strict, as you would expect. OK, so we are taking this function as to take a potentially null user, and it's going to return a Boolean whether the age is even. Now what I'd really like to do is get the age of this user, ticket mod 2, that'll determine if the age is even. But this user is potentially null. So what do I do? So I can't just call the get age function on the user. I'd be calling a function of a potentially null object. The type checker wouldn't like that. You can only call methods on non-null objects. So how do I make this potentially null user into a non-null user? Well, in idiomatic PHP, you just say, if not user return false, or some default value, it doesn't really matter if it returns here. And you can do the same in hack. The type checker is flow sensitive, so it understands idioms, like if not user return. And so it knows, after this if statement, that the user can't be null anymore. And so it will allow you to call the get age function on it, and then return age mod 2, and if that's equal to 0, then the user has an even age. And so this is type correct hack code. The type checker can actually type check all the way through this, and verify that you're doing the right thing. Importantly, you might want to notice what's missing from here, which is an annotation on the age local variable. The type checker can infer that that's going to be an ant, presumably, because user get age is annotated to return ant. You only have to annotate in hack at function boundaries, function parameters, and function return types. All locals are inferred, which saves you an awful lot of typing and retyping of types in some other languages, like Java or even C. We can infer all the locals like this. So that's some of the simple examples of the type system. Let's look at a more complicated example. Let's look at the typical box class. It's going to be a generic class that takes some object in its constructor, and then can return it. There's a simple holder. So box, this is going to be box of T. This is a generic. Hack generics look the same as they do in a lot of other languages. We're going to be parametrized over any arbitrary type T. We're going to have a function constructor, which looks like a PHP constructor, except for this little bit of syntax here, private $TX. This is a neat little feature of hack, a small little thing called constructor argument promotion. What this does is simultaneously defines a parameter to the constructor of type T called $X. And it also defines a private member variable on the class box of type T called X, and then assigns the parameter X into the member variable X in the constructor. It's a really common pattern. It's just a shorthand for that. There's that kind of really nice shorthand because it allows this example to fill my slide. So we have this private T member variable, and then I can get return T, return this arrow X. And so this is the box we'd operate as we expect. So hack has generics that look like this. How do you use it, though? Let's use a function app that's going to use this box, and app is going to return a strength. So to use it, I just say $box equals new box past the $X here, the parameter X in the constructor is going to be the amp 42, and then I'm going to return box arrow again. If you notice, again, the local here, $box, was inferred. Even though it's a generic, I don't have to write that this is a box of hints anywhere. The type checker can infer that for me. And as a matter of fact, you can see that the code I've written here on the slide is a type error. Even though I haven't written whether this is a box of int or a box of strength, the type checker can see the error here. And let's actually look at what the error looks like. The type checker is going to say, OK, new in-value return type, this doesn't work, this is bad. And instead of pointing to the line and call it number, it's as you'd expect. But the type checker is going to do more than that. Just saying there is an error here is, well, it's useful, but we can be a lot more useful than that. We can point to how we infer what things were what and what the incompatibility was. So more than just this line is bad, we can say, well, you wrote a string here and it's incompatible with the int that you put here. We keep witnesses for all of our types whenever we're doing inference on locals, even through generics. So we can point to things like this and say, you wrote string, and I think this is a box of int. And so the return is bad, and here is where the int was that I said this was a box of int for. But you give really good error messages that explain the incompatibility, unlike many languages that I've worked with, to help the programmer figure that sort of thing out. So why static types? Why is this a good thing for PHP? PHP is a great dynamic language and actually derives a lot of power from that. So how does adding a static type to some of that, which is of necessity, going to restrict the inner nature of PHP a little bit? Why is that a good thing? So the obvious answer is this here. Here's my example from the very beginning, this list syntax on strings, what does it do? Well, the fact that this is inconsistent here is irrelevant in hack. These are both type errors. You just can't do this. We use the hack type system to sand off tons and tons and tons and tons of edge case absurdities like this in PHP. We just entirely roll them out with the type system. And so you'll get that type error immediately and not have to worry that this is bizarrely inconsistent. However, this is absolutely an edge case. And so the type system is good for much more than just sanding off all these edge cases. Let's take a look at a somewhat more realistic example. Let's go back to our social network from earlier that we're building up. Another common feature of social networks is some sort of newsfeed. So here's a function we might think of that renders a newsfeed. We're going to get some sort of raw raw stories from our back end, maybe ranked, maybe something like that. It doesn't really matter how this works. Then we're going to take all of those raw stories and do some data fetching, get all of the rich story data, the photos, the sound, and other stuff, whatever it is we want to display in our newsfeed. Then for each of those stories, we're going to run directly back to it out of the page. So can anybody look at this code and see what the bug is here? What's wrong with this code? Think about it for a second. So what's wrong with this? It depends. It's a little bit of a trick question. The crux of what I'm getting at is how do we do with error handling here? In particular, what happens if this second line gets story data raw stories fails? Maybe one of these raw stories, there's a database down and we can't fetch the photo for it. How do we do with failure? We probably don't want to throw an exception in the getStoryData function. Because then that exception would propagate or at least would kill all the data fetching for the raw stories, probably propagate out of the render feed function. And we don't want to completely blank someone's newsfeed and fail to render it. If just one story is failed, we'd like to null out that story and continue on. And so getStoryData is probably going to return an array with nulls in it or something like that. But is that what it's going to return? Who is responsible for the error handling here? Who deals with the relevant nulls that are inevitably going to come up here? Does getStoryData return an array of things that's potentially null and is renderStoryOK with that? Or is getStoryData expected to filter down the stories to the ones that it could successfully fetch data for and eliminate the rest, such that renderStory never has to worry about that? This is one of the ways, a more realistic example, of where a type check can really come in handy. It can mechanically verify that either getStoryData or renderStory has dealt with your failures, has dealt with your nulls, and that one of them isn't going to randomly explode whenever your database falls down. Three weeks after you write this code and didn't think about it, and suddenly everything explodes in correction. We can mechanically verify that what you've written makes sense right now. I want to be clear, because sometimes people mishear me when I'm talking about examples like this, type systems are not a replacement for testing in any way, shape, or form. They're a good way to help find bugs like this. They're a really good supplement to testing, and a way to mechanically sort of automatically get a bunch of tests, which are the consistency of your type system. But they are not a replacement for testing. Please don't hear me say that. OK, so I'm going back to my question for a moment ago. Why static types? Innocence, types help you manage technical data. Here's some examples of how that can help. Types can help you find subtle bugs. Back to my newsfeed example a moment ago, they can help you find maybe a latent crash in there when a database goes down, by telling you there's a type error immediately. They can help you prevent you from writing corrupt data into a database that will then be there for all time forward, unless you go write some script to go fix it, really, really painful things. My favorite example of this is, at least in PHP, suddenly converting nulls to the integer 0 in some cases and running those into a database. When you really want a null or maybe your code should be doing something else, that's a good way to get a corruption. There's a bunch of others, and types can help prevent a lot of common hit costs from that. Types can also help you flesh out APIs and make sure they make sense. Again, my previous example on without newsfeed, it makes it clear in an API who is responsible for dealing with failures and for checking for nulls. Types make it clear who's supposed to be doing that. Again, not a panacea, but it helps make APIs well clear and well-defined and machine check that. And finally, whenever you inevitably get it wrong, everyone refactors things and changes things. Types help you refactor that code. If you're going to change who handles the null in my newsfeed example, you move the null annotation from one place to another, and a type checker immediately points out all of the places that you now need null checks. You can go, again, mechanically verify that what you've done makes sense. If you want to rename a method, a type checker can help you make sure that you've found all of the call sites whenever you're renaming it, and so on and so forth. So type types help you avoid technical debt and fix it whenever you never really have it. Again, not a panacea, but it helps an awful lot. It's been invaluable for Facebook, and I think it's invaluable in general. So some other features of hack. First is backwards compatibility. I touched on this a little bit before. Hack and PHP have the same runtime representation inside HHVM. Calls back and forth between the two languages are 100% completely free. This means that you can convert code gradually as it makes sense. Maybe some core part of your code would really benefit from async functions. And you can go take that core, write it with async functions, but keep the rest in PHP. You don't want to convert it now. Not a big deal. The calls in and out are completely free. You can maybe write some new feature, some new tool, some new something in hack. Again, completely free interoperability with all of your existing code. Wikimedia is already on HHVM, meaning you can experiment with this right now against your existing code base since you're already set up with HHVM. But backwards compatibility doesn't all mean just with your existing code. Again, since hack runs on HHVM, it is backwards compatible with your existing deployment strategies or through existing monitoring strategies that are already working on the large body of PHP codes we have deployed in production today. But better than backwards compatibility is forward to compatibility. We recently released the hack transpiler, which converts hack code to PHP. This means that in particular, the MediaWiki project, which I understand you want to keep releases available for folks who haven't switched to HHVM yet and are still running on PHP 5 or Bitcoin PHP 7, you can write your code in hack and then use the hack transpiler to generate a release that is compatible with folks who aren't running HHVM. It also means that if you decide this hack thing was all a bad idea, you could run the hack transpiler across all of your code all at once, convert it back into PHP, and be done with it. Of course, I hope you don't do that, but it is an option and something that you could in principle do. In the interest of full disclosure and honesty here, the hack transpiler is still a little bit experimental and doesn't support all the features that I've talked about, and we'll be talking about for hack. I won't bother with the laundry list right now. But more important than that, if the hack transpiler, and it's working on some feature or something like that, is a blocker for anyone wanting to use hack, particularly here, I am more than happy to work with you and make sure that the hack transpiler fits your needs if that is what needs. So please let me know, I'm happy to work with you on making sure this tool gets to where you need it to be if it is not currently there today. Another feature of hack is XHP. XHP is sort of like a template, but it's much more powerful than that. This is how Facebook renders all of its UI. So XHP is a way of embedding XML-ish elements directly into your hack code, manipulating them as objects, defining new classes as actual classes in your code, and so on and so forth. It has a lot of the same benefits as existing template engines, which is helping separate out your markup from most of your controller logic, doing things like making sure that your escaping runs on all unsafe strings, et cetera, et cetera. But it has a lot of other benefits that I've not seen any other template engines, such as Twig or Smarty or anything else like that out in the PHP world. So let's look at some examples and look at what I think is the power of XHP. So we're going to have this simple example. Here's a function F, which is going to return some arbitrary XHP element. In particular, it's going to return a UL. If you notice this syntax here in the return, this looks an awful lot like just embedding HTML or XML straight into your hack code. We're going to say, OK, I'm going to open a UL. This UL has a class attribute. We're going to call this my fun list. It has two items, item one. And then item two is going to be this dollar text, high test. And that's going to properly escape as you expect. However, this UL is an actual object, an actual class and an actual object inside your hack code here. Meaning that I can manipulate it as an object. And I can define my own new classes and use them just like I'm using this UL here. So let's look at an example of how I can find a new XHP class. I'm going to find a Facebook feed story, and it's going to extend to the XHP element base. I'm glossing over a lot of details about how exactly you extend things, et cetera, et cetera. I can go into that later if anything's interested, but this is the definition of other new feed story. It's going to have an attribute, just like you can have class or style. This is going to have an attribute called story. It's required, and it's going to take a feed story object instead of a class name or a style, CSS style, or anything like that. Then in order to render a feed story, I just write some hack code in this render function. I'm going to grab the story attribute out, build up this div, and then we can have a class story. I'm going to take the story's title and put it in this FB title element. So if you notice, I've defined a new element here, FB feed story, and I'll show you an example of how you use this in a moment. But importantly, I've defined it in terms of another custom element that I've presumably defined elsewhere, called FB title. You can build up these small units of functionality and then compose bigger and bigger elements in terms of the smaller thing, the smaller, reusable components that you have defined. And that's something that I've not seen any other templating engine let you do. Now here's how you actually use the FB feed story I just defined. We're going to have this render function, which is going to render a feed. It's going to take an array of feed stories. It's going to start with an empty feed, just $feed equals FB feed slash, this is just an empty feed. But this $feed is an actual object. I can iterate over my stories and add a new child into it, and the new child is going to be this FB feed story I defined in the last slide. I'm going to give it this story attribute and stick it in. A lot of templating engines will let you do things like this via a 4-H loop inside the definition of this FB feed in some sort of domain of the language. We let you use the code that you are using, the hack language, the PHP language, and just manipulate this thing as an object, and then return it back out to either continue to be manipulated or be stuck into some larger component. Again, the way you're sort of used to thinking about writing code, but doing it to manipulate UI elements. The power of this, and how useful, making small reusable components, defining larger components as compositions of those, manipulating things in code like on this example, the power of that is invaluable to Facebook. Once you've started using it over other templating engines, and I think at least the complicated UIs could be invaluable for all of you as well. So another feature that I alluded to earlier are hack collections. Hack collections are a replacement for PHP's arrays. For folks that are really familiar with PHP, it basically has one data structure, the array. The array data structure in PHP is the kitchen same. It is sometimes a vector, sometimes a map, sometimes a set. Who knows which one it is. You sort of have to have that knowledge in your head, because they all operate the same. This also means that standard library functions either have to be told or have to guess, usually whether you meant a map-like array or a vector-like array. Again, folks who can move with PHP might know the difference between plus and array merge and when you might want to use one and when you might want to use other. Array merge actually tries to guess whether the array that you sent it is a vector-like array or a map-like array. And I really hope that the map you sent in didn't have integer keys and that they weren't something important like user IDs, otherwise you're going to be in for a really nasty surprise. So hack collections separate out the concerns of all of those while defining new objects. Hack collections are actually objects, which also let some other nice things happen. So let's look at an example of a piece of code written with the functional style of arrays in PHP and then the equivalent code using hack collections. So we're going to write through this somewhat weird function called getPostFriends. It's going to take the first primer posts, get the authors of all of those posts, and return the subset of those authors, which are friends with the given user, the second primer here. Kind of a strange function, but it is just to illustrate how these collections work. So we'd like lots of maps and filters here to work on these collections. So when I think about this, and the way I just described it and what I'm probably going to want to do is take these posts, map them down to the authors of the posts, and then filter down those authors to the ones that are friends with the user. So in functional style, you've got to write it the other way around. Filter, then map, because you've got to write it inside out, basically. So OK, let's look at the map. This map is first going to take an actual mapping function. Here's PHP syntax for that. Function takes in a post, return a post to get author, and that's going to map over all the posts. This is a fairly verbose anonymous function here, but the next one's even more verbose. In order to filter these down, another anonymous function is going to take an author, but it also needs to close over this dollar user. It needs to refer to the dollar user as this parameter defined in the outer scope. In this return, author is friends with, and we're going to filter this down. So we've sort of written this inside out. It's kind of ugly. The anonymous functions are really verbose, but there's some more subtle issues going on here. First of all is that array filter and array map take their arguments in the reverse order. Array map takes the anonymous function first. Array filter takes the array first. And so that's fairly awkward, particularly when you're looking at code like this, that the functions come backwards. But even more subtle than that, because PHP arrays, this dollar posts array, are both maps and sets and vectors, depending on how you want to use them, does anybody actually know, looking at this code, if dollar posts is intended to be a vector or a map, does this code preserve keys or not? I had to go look at the definition of PHP sender library. I don't ever remember this. It turns out that array filter and array map do preserve keys. So this would operate fine on map like posts. It would probably operate fine on a vector like posts, too, except with this filter, you're going to get non-partiguous elements, which may or may not matter. And it's just kind of messy. It's something you have to think about a lot, and it's not really clear what's going on. So let's look at the same example with hack collections, where we've separated out the different collections and made a lot of things very nice. So here's the function prototype. Again, hack code, getPostFriends, we are explicitly saying we are taking a vector of posts. This isn't a map. It isn't a set. It is a vector. We're not preserving keys. We're keeping everything continuous. It says so right here at the signature. And the collections and the type tracker APIs will make sure that you do the right map. And then we've got the user here. So again, I think I want this in terms of map and filter, and that is exactly what we're going to write. Since posts is an object, it can have a map function on it, posts arrow map, and then we take a post and return post to get authored. This is using another little feature of hack called short lambda syntax. This is semantically equivalent to the anonymous function I wrote on the last slide. It's just a lot shorter. Function takes a post, returns, post to get authored. Then, well, you can take an arrow map, an arrow filter, author, arrow author is friends with user. With the short lambda syntax, we don't need to explicitly capture this dollar user from the outer scope. So this is a lot cleaner because collections are objects. We have short lambda syntax. And because we've made a clear separation of concerns between whether it's a vector or whether it's a map, separate all of that back out. So that's all I have to talk about hack today. Hopefully I convinced you that hack is a really useful, both evolution of PHP and also a really great language in and of itself. You can check out hacklang.org for more details on the language, download the runtime of the type checker, try it out. All this is open source. There's a link to GitHub on hacklang.org, so you can check it out, check out the source yourself as well. And at this point, I am happy to take questions. So what percentage of Facebook code base has already been converted into it? Depends on how you want to count. If you count files that have just HH at the top, like 98% or 99%. If you want to count files that use hack language features, I would guess probably most of them due to the prevalence of asyn functions. If you want to look at things that use just the hack type system, just something that we track fairly closely, something like, again, depending on what metric you looked at, something like 60% to 70% of our front-end code is type checkers. So an awful, awful, awful, awful lot of it. To mention how you can generate PHP from hack. So would it be feasible to develop in-hack so you get all of the stack-typing benefits, but then basically distribute your code as PHP, sort of like really type script works? Yes, that is the actually intended use case for the hack transpiler. It's intended for your canonical source on GitHub or whatever to be in-hack. And so people that are contributing would probably have to write in-hack to make sure that you think you're using TechCrunch, et cetera, et cetera. But when you make a tarball to distribute as a release, or even if you're not running HHVM on your production server, it's to do a push, then you would then run the transpiler, generate your PHP, either send your tarball upstream for users that you have in our HHVM or to your servers or whatever. That's exactly the intended use case. We have five more minutes for questions. S on IRC wants to know if anyone's using the transpiler. Not for anything of the scale of what Wikimedia is. So I don't want to be dishonest. It's still a little bit experimental that are missing features. But if there are any problems that you run into or any missing features, I want to make it work for you. Wikimedia was actually one of the people that I had in mind as probably a customer for this eventually, or at least hoped. So I'm happy to get whatever resources that you need from Facebook site working on it to make sure that it works for you. How many people at Facebook are working on HAC specifically, like the language features? Depends on how you want to count. So there are two and a half, three people working on the type system of the type checker, which I guess I'm counting myself in that. What's that? Six by my account. Six by your account. Oh right, we've got two more. Yeah, plus another, maybe six, depending on how you want to count that, plus another, how big of a source team. There's six people working on HHVM open source, a lot of which ends up being HAC language features, but also compatibility HHVM with PHP, and then there's the performance team, which you probably don't want to count. So anywhere between six to 10 or 12, depending on how you count it. One question's for Marcy. Can I have a question? Sure. So you told that when you talked about static type system, that if static checker passes the code, then it cannot produce error in runtime. So how that works with dynamic features of PHP, like dynamic function call, call the user function, and so on? It doesn't. So one of the things I lost over in the talk is that HAC actually has three modes for its typing. The two that are important for this discussion are called partial and straight. All of the code that I showed in my slides was actually partial mode, since are my slides still visible to turn that off? So here, this has the HH header at the top. So this is partial mode by default. Partial mode rules out some of the really egregious dynamic features of PHP, but it allows things like missing type annotations and interoperably with PHP code that isn't typed at all. And so you don't have the sound guarantees that I was describing in partial mode. You have the guarantees that what we have annotated, we will check. But things can be missing, things can be complete, things can be dynamic, and we are able to check that. And so you absolutely can get runtime type variables in partial mode. There's no fault. By egregious features, you mean dynamic functions and such? Sorry, can I say that again? By egregious features, you mean dynamic functions and such? We don't allow dynamic function definitions. I think we still allow things like call user funk array in partial mode, and we just won't be able to type check that function call. In general, when HAC can't type check something, we just assume the program knows what they're doing. So call user function just doesn't type check. You mean you can actually pass object of wrong type, but it would be actually passed on runtime? Correct. Because you're using partial mode, correct. So what would happen on the receiving side if you get the object of wrong type? Then you might get a runtime type error if there's an annotation on the function that you're calling that says it expects an object of different type, just like you would in PHP if you pass something of the wrong type. We do disallow these dynamic features in strict mode, which you can turn on by putting slash slash strict after your HH opening tag. And then the type checker goes into this strict mode where it doesn't allow dynamic function calls. It doesn't allow a bunch of things. Everything has to be annotated. And if all of your code is in strict mode, then you shouldn't be able to get a runtime type error barring any bugs in the type checker itself. And they are considered wrong. Last question. Would there be a way to do call the friends method on you speculatively before you know if it's needed, then only use it some time? That's not usually how we think about structuring code. Because it's not callback-based, you shouldn't need any of that speculative stuff. It's just, when I need it, I wait on it. You want to have these strict data dependencies so that you aren't over-fetching data and over-weighting on your data? Because it's so easy to write that code, you use fetch data as you need it, and we deal with the parallelism. You do want to structure it in terms of the dependencies, though. So things that aren't dependent, you don't end up blocking them and doing extra rounds of fetching. OK, we're out of time. Thank you so much. I'll find a way for Josh to be able to answer questions by email for everybody to answer questions. So thank you so much for coming. Oh, on IRC. Thank you.