 All right, well, I'm going to try shouting at the microphone so that everybody can hear. Hopefully, we'll find an AV guy who can turn up a little bit for us. But here we go. So the title of my talk is called Node.js, JavaScript, and the Future. So this is probably the only talk at DrupalCon where you'll hear somebody throw the word node around a lot and not be talking about some like, you know, a pristine piece of content as represented by the Drupal system in the database. So when I say node, I'm going to try to say node.js so that no one gets confused seeing as we're here at Drupal Conference. But for the most part, when I talk about node, I'll be talking about Node.js, the, well, we'll talk more about what Node.js is, but Node.js is sort of the programming environment. So my name is Jeff Michalis. I work at a company, or I guess most of you who have seen me present before just know me generally from the Drupal community, would know me as working at Development Seed. Where Development Seed was a small consulting shop in Washington, D.C. We did a lot of Drupal work for a lot of, mostly like D.C. type organizations. So NGOs, governments, we tended to do a lot of collaboration sites, a lot of open data sites, and a lot of stuff was sort of an international event on it, did a lot of work with multi-lingual stuff. Personally, I was most involved with, these are some of the projects I was most involved with. So OpenHRM is a Drupal distribution. It's a software that we still use internally as a group collaboration suite. OpenHRM is a group collaboration suite that we designed as sort of an in-house tool. We ended up finding out that other people found it useful. It's a Drupal distribution that is actively maintained and developed, and we still use it every day internally to manage our work. Features, maybe a little too loud. Features is another module I worked a lot on. This was for capturing content, or capturing configuration, exporting it out of your database. And other sort of more detailed modules that some of you may have heard and may even still use, Context, StrongArm. But today, I'm Jeff Mikolas from Mapbox. We're a small company in Washington, D.C., and we do maps. If we were VC-funded, you would say we did a pivot, I guess, but more what happened was we found ourselves focusing on a particular aspect of our work at Development Seed, which was open data, which was visualizations, and most specifically, it was maps. And so currently, we have a hosted service that you can see at Mapbox.com for hosting tile sets that you create or integrating your site with tile sets that we created from other open data sources. Recently, we had this site, Four Square, switched to our maps, which was pretty awesome and taught us a lot about just how big the internet is. It's really big. So that's sort of me, that's what we do at Mapbox. My colleagues, Alex and Eric, will be giving a talk in this next session after this one, and if you guys know what room it is, shout it out. Well, it's in your schedule. They'll be giving a talk about maps that goes in more detail about how exactly you go about creating maps. It's not really what I'm gonna talk about today. So I do owe you all a brief apology, though, because in my description of my talk, I sort of implied that I would be getting into the real nitty-gritty of working with Node.js, and I wrote that talk, and it wasn't as much fun for me to give and probably for you to hear. And I know a lot about you guys, so I figured there's actually a better spin that I could put on telling you all how Node.js works rather than going through the nitty-gritty of what libraries I like and what libraries I don't like, because the truth is that stuff is also fun to explore and it's not that hard, and if you really want to know all that stuff, I hope you all don't track me down, but you could all track me down over the next couple of days, and I could tell you about it. So what I'm gonna do instead is tell three sort of war stories, stories about how you deployed something, that went horribly wrong, and you recovered from it somehow. Three war stories about things in PHP and with Drupal that were pretty hard to deal with, things that I got burned on personally and lost nights and hours and days, and that would have been easier if I had looked at these same problems using Node.js. So three war stories, and then I'm gonna talk about two sort of lessons. Lessons I learned not explicitly and only from transitioning from Drupal to Node, but it's just sort of moving from one big system to the next things that are interesting to keep in mind, interesting to learn as you change from one system and tell you a lot about what's the same or different about these two ecosystems, Drupal versus Node.js. So to get the whole what's Node.js out of the way, I think probably most of you have heard of it before, but it is server-side JavaScript, right? JavaScript on your server, it runs in V8, which is the JavaScript engine from Google Chrome. So it's as if we yank apart a Google Chrome out of Google Chrome and tell you to run it on your server. It's a little bit more than that, right? Like JavaScript in itself can't touch all of the kinds of things that you would want JavaScript people to do, like open files and contact servers and stuff, and this is what Node.js adds to V8. This is what it lets it do and it gives you an environment to interact with it and all these, you know, and have it interact with your server. So that's like Node.js in a really, really super quick, like 30-second nutshell. All right, first story. Sending lots of email. So this is something that you always run into, you know, whether it's for spam or not. People ask you to send lots of email, you've got to figure out a way to make it happen. So the way we ran into this problem, not most recently, but the probably longest term we had to deal with it was for OpenHRM. So as I mentioned earlier, OpenHRM is a collaboration suite. Now, of course, with any collaboration suite, you're going to want to notify the people you're collaborating with about the things you're collaborating with them on. So we have an interface like this in OpenHRM that lets you click on users and notify them and this is me and my colleague Ian, talking about probably something with servers. And so you come to a page in OpenHRM, create a piece of content, notify a bunch of people. And when we were building OpenHRM, we wanted a little more flexibility than just sending out email to a couple per people. We wanted to be able to send single emails. We wanted to be able to send digests. And then, of course, once you start selling this kind of stuff to clients, they want more than that. They're going to want to be able to send text messages. They want to ping somebody on Java when there's a new thing. They're going to want to post it to Twitter or whatever the heck else they have decided is crucial to their business. So to deal with this, we wrote this Drupal module called Messaging. Some snippets from the description on the project page channel independent messages. Don't send emails to users, send them messages delivered by mail, IAM, SMS, et cetera. So this was a way to abstract the whole message sending problem. Now, to get back to where Node.js would have been interesting to use here instead of PHP. In Drupal, when you hit save, the most naive way you can send an email, right, is to do it right when the user hits save. Like they post the form back and at that spot, you say, great, I'm going to start sending email. Now, if you've sent a lot of email, this is a problem, right? And it's a problem because sending email takes time. So for the sake of having easy big round numbers, I'm going to say sending a single email from your system takes like 50 milliseconds. I did a couple benchmarks to figure out that this was not totally made up number and it takes between like 20 and 50 milliseconds to send an email. So if you're imagining that you're going to be sending emails to 28 users using just like the PHP mail function, right, then you just like add these numbers up, right? PHP, you just add 50 milliseconds, 50 milliseconds, 28 times. So for me to notify 28 people, which is my entire shop, which is all of Mapbox, it's going to take a second and a half to refresh and send all those emails out. That's a little bit of time to wait. It's not crazy. If you're just posting content, you'll deal. But that's how much time it'll take. For 600 users, it starts getting bigger, right? And in fact, we had sites that had this many users and it becomes a problem, right? So the solution to this problem, not too difficult, instead of sending when someone hits submit, you move stuff later on, you move it to when you run Kron. So on bigger sites, or actually most of our sites, we ran Kron every five minutes. And so that gave us like five minutes to do all this email sending work. You can send a lot more emails, like a lot, a lot more emails. Users don't really notice that you're sending these emails. They just go out and by the time their email client checks every five minutes, they've gotten the email. So it looks like it's instant, great. But like, if anyone has interacted with Kron.php under circumstances where it doesn't complete, you get stuck in these loops of Kron not completing and never really catching up and this gets bad. And that's not really where you wanna be. We got this way with our community open atrium site a few times and if any of you were ever subscribed on that, you noticed how emails would stop working for like weeks at a time and then we'd like clear the queue and hit the button again and they would start working again. So to show you in code actually, what's going on and to then to show you in code what would look different in node, here's an example of the kind of script you might find on your server if it got hacked. So this is three addresses up at the top and if your server got hacked, this would come in on like a post variable, right? A message is gonna go out and then sort of highlight it in yellow there and I hope you all can read this. I'm sorry if you can't. Is the call to the PHP mail function that's gonna email these three suckers about a new opportunity that they're being righted to inform of and it's gonna count and then at the end it'll print back to the hacker how many messages went out. So this is pretty straightforward, right? And what's gonna happen here is that every mail is gonna get sent. If the mail's successful, the counter will get incremented so that you can see how many successful messages went out and go out one after the next. In Node.js, the same script written a little bit differently, right? Would look like this. So just as a note, I'm requiring a mail module up top that doesn't exist but I wanted something that was exactly the same function signature as the mail calling Drupal or in PHP and doesn't exist so I made one up. But the only really hard thing to understand if you've worked with JavaScript before in this code here is the require call up top and that's basically how you require in other modules like other files in Node.js. Everything else here is pretty self-explanatory and would basically look the same in a client, like a web client. So the yellow code there though, it's a little bit different, right? So you can see first of all that this mail call isn't wrapped in an if statement like it was in the last slide and I'm actually passing it a function, right? And this last argument, this function is what they call a call back and it's gonna get called after the mail completes, after the email gets sent out. So what's gonna happen when I run this is all these emails are gonna get queued up at the same time. Like they're all just gonna get queued up one after the other and then it won't be until they're all queued up that Node will actually like sort of let go and let the rest of the operating system start handing these messages out to the IO and then those messages are all gonna go out in parallel and when they all complete, they'll all call this call back function, right? And after this call back function gets called whatever, three times here, it'll console log out for me to see the messages that were sent, right? So this is probably the biggest difference between Node.js and all the other languages, right? And most of the other languages that are out there. You can do this sort of asynchronous programming elsewhere but this is what Node.js gives us in a platform that's sort of easy and accessible for us to use. To sort of like try and make this point again, these are basically less code but the same stuff. PHP on the top, Node.js on the bottom. Each one of them ends with basically an echo next, right? Printing next out to wherever we're running this code. On the top example in PHP, that next is gonna get printed out after all the emails are sent. Once they're all gone. The bottom code, it's gonna get printed out almost immediately and your program will keep running and sending those emails. All right, it's like a completely different way of writing and interacting with the code. If you wanna do something after it's done, you'd have to do it in the call back. So I'll get back a little bit more to like how this works logistically in Node in a little bit but the upside of this is that you can stop waiting around for actions that involve IO. So in Node.js let's say for just like for some big round numbers because the five milliseconds here is too high. Let's say sending an email takes five milliseconds for your scripting language, whether it be Node or PHP to put together and then basically you spend the rest of the time waiting for whatever server is upstream to tell you that okay, yeah, I'm gonna try to deliver this for you. Right, so to send an email, a single email, you're still gonna have to wait that 50 seconds but if you're trying to send more than one email at once, if you're trying to send 28 emails if I'm trying to email all of Mapbox at one time, I'm gonna wait the same amount of time for my script to process these emails but they're all gonna go out concurrently so I'm really only gonna, I'll be done 45 milliseconds later. So with 600 users, I'm just gonna have to spend three seconds to process but after three seconds I'm done 45 milliseconds later. So to compare that to the kind of round numbers here we're talking about with PHP, we're talking about with PHP just based on our access to the IO, I could send these many emails in 30 seconds but in Node.js I can do it in three seconds. Okay, so this is sort of the idea, this asynchronous way of interacting with your operating system and with your IO is like the main thing that Node.js gives you and next I wanna try a demo and normally demos are really risky. This one is like super duper risky because I'm actually gonna have people so I need four volunteers. You're not gonna do anything involved public speaking. You, you, you and you, yeah come up here. All right, great so you three guys come over here. You're gonna wait for a second. You'll be my IO when I'm Node.js. You're gonna get to be my IO when I'm PHP. So I'm a PHP script and I wanna do this fair. So basically I've got some cards here from people in the middle IO and instead of sending emails with people I'm gonna have you return their cards to them. They all know who they are. They know they gave me their cards so they shouldn't be that hard to find if you like say what company they work at. So just, just try and get this card back. Yeah, just right in the middle, just tell their name. So right, he's starting at 602 in 30 seconds and we'll see how long it takes them. You've got three of these to do. So this is like one after the other. This is like synchronous IO, right? We have to wait. We all see how much he's running so you guys watch that, right? So we're having him, it's basically the same walk. All right, great. It took you like 40 seconds to send my emails. Thank you, I appreciate it. All right. All right, you guys heard the same instructions? Yeah, I'm gonna wait till my clock has something round on it. Just give me a couple seconds. Okay, Adam, go. Somewhere in the middle, there's somebody. Right, so we send them all out at once. And to keep it fair, you guys should come back up here after you've delivered it. Yeah, yeah, come back up to keep it fair. And you took your sweet time, there was no running there. You did that in 20 seconds. So right, we have the difference. Thanks guys, appreciate it. I know that is sort of like a bludgeoning you with a hammer demonstration of the difference between synchronous and asynchronous IO but I hope that like gets the point across, right? It's like, when you're doing things synchronously, it almost doesn't matter like how fast your script is or how fast your computer is. If you're waiting on IO to do stuff for you, like which the operating system is managing and generally speaking isn't your bottleneck, you could actually do it faster by just telling your IO to do a little more work. Okay, so there's my email story. Second story is feeds. So one of the other big projects that Development C did was this system, this like news aggregation system called Managing News. And as you can see, it was a big sort of complicated interface that aggregated a lot of news stories. Geolocated stuff and actually this slide is kind of cool because I think this is one of the first maps we like developed and rendered on a completely in-house stack and this is from like 2008 or nine, 2008, right? And we have a big tag cloud down here of like tags will be extracted from the feeds that were coming in. All right, so the task that Managing News had to do was to fetch a lot of RSS feeds in the cases where we knew we could actually get the original article from like the source of the feed. We would go out and try to do that. We went through various iterations of various different tagging systems like Open Calay. We had like a keyword sort of washless system. We had this crazy titles of pages on Wikipedia system and we went through like even more geocoders that I don't even want to try listing out. So like the task of Managing News was fetch a lot of feeds, do a lot of processing with third party sources on them and then try to present that way and present that back in a cogent way to the user. So the scale of Managing News really made it interesting. We had any instance of Managing News would have thousands and thousands of feeds. There would be millions of items, millions and millions of RSS items and gigs of data. So like the testing databases we used to use for Managing News would be on the order of like four and a half gigs and this was like the balance between something you could kind of manage on your local environment and big enough to simulate like what a real site was like. So it was huge, right? So the solution to dealing with a lot of IO that I talked about with email failed horribly for us in Managing News, right? There was just not enough time in an hour or in a day to guarantee that you would hit all of those feeds, right? And we wanted to make sure at least like twice a day we would check a feed that you had given the system. Like ideally we would do it a lot more than once or twice a day but we needed to guarantee you at least that and Cron it just couldn't get there for us. So we took inspiration from one of my colleagues dog, her name is Maggie and so there's these stories of dogs that would go fetch your paper for you, right? And it's like sort of a nice example of like a little bit of asynchronousness that your dog can take care of for you. You don't have to walk out and get the paper, you can keep drinking your coffee, the dog brings you the paper, right? So we took inspiration from Maggie and we wrote Maggie D which was a multi-threaded Python daemon. This was before Node.js existed and what we really wanted was just a way to access more IO. Like how do we do more? What's a good like sort of multi-threaded environment? Like we thought about like just spawning up a lot of Cron's and trying to have them not like mess with each other and said that was a bad idea. So we wrote this Python daemon and the Python daemon would access the Drupal database and do stuff for us. So what it ended up looking like was four, we called them retrievers, I'll keep the dog metaphor going, four retrievers would get batches of like 50 items, 50 news stories that were new and they would do a couple tasks. They would attempt to fetch the original story from like Yahoo or whatever. They would throw it against the third-party tagging service and then they would try to geocode those items, right? So we'd throw up a few of these workers and they would all go do this. So the times it took to do this kind of stuff and these are just really bad times for my memory so they're probably completely wrong but I think they're sort of relevant. Retrieving an original story, so loading someone's webpage and scraping it in Python, meh, about 300 milliseconds. Some websites are a lot faster, some websites are a lot slower and then assuming that our tagging and geocoding APIs or third-party services are relatively responsive, let's say they can do it in 100 milliseconds. So a total for each of these threads to deal with one story would be like a second or a half second rather. Now the issue with that half second is not that it's like too long, right? But that during that time my server is not doing a darn thing, right? It's just waiting for these APIs to respond, right? So all of that time I've got this excess server capacity on these big beefy boxes built to handle really big databases and it's just not doing anything. So the obvious sort of like next step here could be like lots of dogs, like lots of retrievers. Instead of running like four, run like 50 and have them all like with one story going out, sort of like trying to do all this. And for various reasons that didn't end up working for us although theoretically it could work. And in retrospect I have to question whether that was really even gonna be the best idea in the first place. So having done some work in Node.js now, if I were to go back to this problem of like how do I interact with all these third-party APIs and get them to do stuff fast for me, I would replace the retrievers with a single hyperactive squid. I really wish I came up with the hyperactive squid line myself but if you Google hyperactive squid and I suggest that you do, the first result will be an awesome presentation about hamsters, bunnies and squids as a metaphor for how Node.js works on your server. Don't do it now, but you can remember hyperactive squid, Google it later. The gist of that presentation goes like this. The squid runs up and down as fast as it can dealing with every item in turn. So every item here would be like every bit of IO, right? Like every time it has to talk to a third-party service it's gonna run up as fast as down and talk to all those guys. It fires off any long-running IO operations and then moves on to the next one. So as it gets to each port, as it gets to each IO activity it says like did anything come back here? Yes, no, fine, next one, just let me know when something came back. When the IO operation reports progress, it does a little more work and then moves on to the next, right? So you've got this single process, right? In Node.js this is called the event loop. This single process is like looping around all the time. So when I fire off a Node script I can do one thing at a time inside that Node script. It's not like multi-threaded, it's not multi-anything. There's a single event loop. And I have this hyperactive squid in there that just keeps running around checking as much IO as it can and only doing something when something comes back from IO. And the way this can work, the way Node.js lets us do this is because anything that leaves VA, anything that leaves the JavaScript interpreter, right? Anything that does more than like munge two strings together or do some division takes a call back. So if you ask it to like go do something it's gonna say, okay, okay, I'll go do it but you have to let me know what I'm supposed to do when I'm done doing this, right? And so it's gonna hand it off to the operating system to the IO to just go like contact a server and it'll only come back and bother the event loop, your main application loop when it's happened, right? So the kind of stuff that leaves VA, the kind of stuff that isn't like string munging and division is gonna be this stuff. It's gonna be file system stuff. It's gonna be network stuff. It's gonna be standard IO stuff. It's gonna be timers like wait five seconds and then do this. It's gonna be things that interact with child processes, right? This is a whole lot of stuff that in PHP looks no different than doing division or calling a function call that just like takes two bits of a string and smush them together or split them apart, right? But in Node.js this looks different. It looks a lot different because it has to be async. Now the way this looks is kind of like this. Again, I hope you all can read this. The top example I've got there is what it looks like in Node to read a file. So I'm doing a require here and this is actually for a real library this time. It's part of what Node.js provides in its core. It's this file system library. And I'm gonna read from the file. Let's see password. And once the IO has been able to load all of that off disk and has it available to me, then that function I pass at the end there is gonna be called and I'm just gonna log it out to my console, right? So that the code that's indented there is not going to be called until the file's completely read, right? And in the meantime, I know JS process is free to do whatever the heck else I'm asking it to do. Another example here on the bottom, this is using a wonderful contributed library called Request, which is greatly simplified like grabbing external URLs, is to fetchexample.com. And you can see it's very, very similar to how the FSREAD file example works. I have to give it a callback. And the first thing I get back in my callback is error, right? It's similar. And this is like a sort of one of the standard things about callbacks in Node. I check to see if I have an error. If I don't have an error, I check to see if I have an HTTP error, right? Did I get a 200 back? If so, I log it out, right? So this is what the asynchronous code looks like, right? And the benefit of doing things this way is that you get to ask your machine to do more work. So when you move to coding stuff asynchronously, for a problem like we had with the Maggie D Python Damon, the limiting factors start to change like in really weird ways, right? So your limiting factors with Node look like just how much stuff can you grab at once? Like how many connections is your server gonna let you do? How much bandwidth is your upstream provider gonna let you grab? And how fast can you actually queue up these requests to send them out, right? Like coding asynchronously with Node doesn't make your server any faster. There's still big limits you have to face based on whatever your hardware is, but they change, right? The classic limitation with PHP, right, is the amount of memory you have versus how many different PHP threads you can fire up running Drupal, right? If every PHP thread you have takes like 50 megs, that's gonna be a significant limiting factor to how much stuff you can do at once, right? With Node.js, you don't really run into that kind of a problem, right? You run into these kinds of problems. All right, third war story. Big files and long sessions. Okay, so by big here, I mean gigabyte. By long, I mean hours. And by sessions, I mean like HTTP sessions, right? I'm not talking about cookie driven client sessions that last an hour, whatever. I'm talking about having open HTTP sessions that last hours, right? And if you've done PHP development for any period of time, you have a whole category of stories that have challenged these sorts of issues, right? Like if you've been coding PHP for a long time, you might remember when people tried to upload really big files using like a modem and how there was always this like threat of if you had too many users with modems contacting your site that could inadvertently like denial of service you because they sucked up all your Apache threats just because they were keeping it open. And of course with like uploading large files, we'll get to there. But there's these four configuration options that everyone I'm sure is familiar with when you're trying to enable big file uploads, right? Max file size, post max size, right? These are sort of self-explanatory. It's like how much information do you want PHP to accept at once? And then there's max input time and max execution time, right? Max input time is, and max execution time basically is like how long do you want your PHP process to be allowed to run? Now, on all of the Drupal sites we ever deploy, you always end up like pushing these numbers a little bit higher, a little bit higher, a little bit higher to try and get more and more performance and you might conditionally push them higher on certain paths or in your code under certain circumstances to get more things, you know, bigger files allowed. And you can push this approach pretty far. I would say it caps out around half a gig if you want to like really not expose yourself to much. You can push it much further. You can like basically let your processes run forever but it becomes very difficult to sort of keep that under control, right? You open the door to denial service attacks and in a more like serious sort of like programmer way you start tolerating application bloat. It becomes okay for your scripts to run longer for some reason. Problems in production can take a little bit longer to show up and you're still never gonna get those really huge uploads, right? You're just never going to get there. You need to be able to put all that file in memory and that just might never happen. So, sort of classical solution here is to look elsewhere for really big file uploads. As I was preparing for this talk I saw a little bit of chatter I guess maybe six months ago about like could there be like a way to use Node.js to handle this for Drupal. I don't know whether that came to anything I'd be interested if anybody did know if that went to anything. But basically PHP makes you look at like various things to chop the upload up or to handle it elsewhere in various ways whether on the client using like a flash uploader or on the server using some sort of like Java servlet to like handle the uploads for you. And if you talk to developers you know after you give them a beer or two that might say things like if only we could stream. Like if only we could just stream the file into PHP it would be great. Invoking images of like pristine streams in the forest and things just working magically. So streaming data looks a little bit different than that as we all know. This is the best picture of a bucket brigade I could find. But basically that's how the internet works, right? You get like one bucket of data handed to you at a time and then you do something with that bucket. It just so happens that PHP when you're doing file uploads gets all those buckets at once, all right? So if we could stream, like if we could stream the data in PHP you would have the flexibility of doing things bucket by bucket. Like chunk by chunk buffer by buffer with the data coming in. And you would need to fill your memory up or even allow your PHP process to fill your memory up. And you could just handle a couple K at a time do something with it, right? And you can do really interesting things when you can stream your data like this. Sure, you can just stream it off to disk immediately. Great, pretty easy. You can do other more interesting things like instead of saving it locally to your disk take the incoming data and while it's still coming in also start sending it off to S3. All right, so that you never worry about filling up your local disk but people can do really big friggin' uploads and you just start pushing it off to S3. If you're doing more complex media stuff you could send it off to be like real-time transcoded or other sort of super crazy things. Now, of course I bring this up because in Node.js this becomes a lot more possible. This code example is a little bit more dense so I'm not gonna go through line by line here but I'm using a library here called Formidable which is a great library written by people at this company called transcoded and what they do is offer a service for uploading really big files that you can plug into your website using a very simple API. Sounds like the kind of thing that could be useful and thankfully they've open sourced their main library for handling incoming file uploads and just incoming posts in general. It's called Formidable. The second library I'm requiring here is a core one part of Node.js proper HTTP which is an HTTP server, right? And so on that I guess technically fourth line there I'm creating a new HTTP server, right? Node.js doesn't run embedded in another server like PHP does, it doesn't run embedded in Apache, doesn't run embedded in Nginx or as part of like a CGI thing. You actually write a Node.js server and it takes that much code, right? So I'm writing here a server and in order to keep my server sort of limited in scope here I'm checking to make sure that the URL is coming in to slash upload and then it's a post. So my server in this example would only respond to post methods at the upload endpoint. I'm using Formidable to parse the upload and then what's interesting here is the stuff I have highlighted in yellow here. I'm adding a listener which is something that's gonna get basically a callback that's gonna get run every time I get a new chunk of data and I'm just logging it straight out to my console, right? So but in here I could do anything. Like I could start writing it to disk. I could start pushing it off to S3. I could do analysis or I don't know, maybe something crazy like encoding of the file, like as it's coming through, I could like change the way it's encoded. And this is like a way to access like an HTTP request as it's happening that you simply really can't do in PHP. In PHP you get things sort of on either end of this. But here I can access the request as it's happening right in the middle and decide to do things, right? And Node.js handles all the nitty gritty of like making sure that you get your packets in order and that like things are called one after the other and not in some random orders that's coming in. So this opens up a lot of possibilities. Now I'm gonna sort of go off in a little side note here about a particular thing that I think is really cool and I wanted to share with everybody that you can do when you're streaming data. So there's a project called CouchDB. It's a, you know, schemalist document store database. It's logo is a couch and it has this really cool feature, among other features, it has this really cool features feature that's called a changes feed. And it's over HTTP and it allows you to basically set up a connection to your database and it'll just over HTTP send you new stuff as it happens on the database. So as people make changes to the database, as they delete stuff from the database, you'll just start getting these like little Jason snippets of like by the way, this file changed and if you've subscribed to the endpoint with a certain query string parameter, it'll just give you the new version of everything, right? So you can just connect to a server and as, connect to a database and as the database changes over HTTP, it will feed you new information about what it has. This is really cool, from like a doorkeeper's perspective, it's like totally cool you can subscribe to your database over HTTP and C stuff. So not only is it kind of neat, like we use it on Mapbox hosting. So on Mapbox hosting, we allow people to upload files up to five gigabytes and these uploads are going to be in a format called nbtiles and they're gonna be maps. They're gonna be maps of some portion or maybe the entire world and they're gonna be big, they're gonna be gigabytes and we need to get those uploaded and then we need to get them on to all of our web heads and stuff. So the one little thing that might be a little confusing about this is that we're not actually uploading directly into our servers. So we use S3 as a canonical store for our data right now. So those uploads actually go directly to S3, right? But on our front ends, we also save a record when that upload completes into CouchDB of by the way, there's this new file for this user at this S3 URI. And then what happens is all that record gets propagated to all of our web heads and on those web heads, there's a single Node.js process running called the downloader. It's subscribed to CouchDB and it sits there listening, waiting for new files, right? So it's a Node.js process, very, very long lived, right? This guy runs for like days and days and days, just waiting for new files to get uploaded. It sees that they've been uploaded and then it starts downloading them, right? So all it does is manage these two asynchronous connections, runs for days on end, all over HTTP, all very simple to understand, also very simple to debug. It's pretty cool, right? So sort of what we found is that you get, when you're moving to a system that does like this non-blocking IO and has like a single event loop, the parameters of how you solve problems can change a lot, right? A lot of things change, right? It's easier to write smaller programs that are connected in particular ways and they can do it over the interfaces we're all familiar with over like HTTP using JSON, right? And it just kind of works like that. And it changes how we've approached a lot, a lot of problems. Okay, those are my three war stories. I wanna talk about two more things that are sort of interesting lessons I learned moving from project to project. So package management. And by package management, basically, I mean things like Drushmake, like package management for Drupal, right? And Drushmake is a really awesome project. If you all aren't using it and building Drupal sites, you should all be using it. It's wonderful, I'm getting there. It will make your life easier when you have to like revisit an old project or apply security updates, you should look at it. Okay, one of the nice things about how Drushmake works in the larger Drupal ecosystem is that it can rely on a few things that make it this useful. So the one really, really interesting thing is that there's a lot of Drupal projects, a lot. But we tend not to have really bad namespace conflicts, right? This is because all these Drupal modules need to claim a namespace on Drupal.org, right? There's like a canonical spot where like if you grab namespace for like a word, you've got it. You can use that in your function calls and you're pretty much okay, right? Other thing is really super nice about how Drupal.org manages projects is we have a really inclusive project policy. Yes, there's a bar to like getting an account and making a project, but it's a pretty low bar. And if you can demonstrate that you're willing to like play by the rules, you get an account, you can make projects, you can make a lot of projects and you can contribute and have people to contribute with very easily, right? And so having a lot of the projects be in one spot and have the naming issues resolved by that spot, make Drushmake really easy to use and really kind of a way to like stay sane. So it's part of Drushproper now, right? Which is great. This means that if you're using Drush you already have it. So like Adam mentioned, you should use it. But sort of one issue for me is that by the time Drushmake had like emerged on the scene it became something that I could use in my everyday work, I had been using Drupal for years by then, right? Like years, like since 4.6. And I can't help but thinking like it would have been nice if there was another project that did package management for PHP that had solved some of these problems for us ahead of time, right? So yes, Pear. Pear is the PHP extension application repository. It wasn't around since the beginning of PHP. PHP started in like 94 and Pear's been around since like 99 or so. But for me, it's been there since I started using PHP. And I checked the code, or I checked the documentation the other day as I was preparing the project and it seems like this changed a little bit. Like the threshold for getting new projects in Pear, they don't talk about it being as high as they used to talk about it. They used to talk about it being like you need to have a real project with real commitments that does the real things that real programmers want, right? But I can't help but imagine like, what if Pear wasn't like that? What if Pear was like wildly inclusive, awesomely useful, like as easy to use as Dreschmake is and it's straightforward, and what if it was awesomely successful? Like that would have made a lot of our lives building Drupal sites a lot easier and it would have given the Drupal community something to leverage that was reasonable, for example. Now, I bring this up, not because I think Drupal should use Pear, I do not. I think the Dreschmake tool site is awesome. I bring this up because NPM is the node package manager. It is wildly inclusive. It is awesomely useful. It is awesomely successful and it does these things for Node.js now and it's like right out of the box. It was there with Node as soon as I started thinking about using Node, right? So Node package manager, something you should know about even if you're just experimenting with Node. It is the canonical spot for like namespaces and for packages and it is very easy to get an account there. It is very easy to publish a project there. I think the only way you can't publish a project there is if you previously published a project there and it was like some sort of security vulnerability or it was abusing the people that downloaded the project, right? Like you have to actually get like kicked out. What this means is there's a lot of stuff there. So like by comparison and just because I have numbers to collect, Pear has 584 project. Drupal.org has a whopping 15,000 modules. 3,600 of which are for D7. NPM has 7,976 as of some point last week and the interesting thing to keep in mind is that NPM is only two years old, right? It has pretty much half as many projects as Drupal.org does but it's a lot younger, right? It's pretty, pretty interesting to see that kind of explosive growth. So NPM's package.json just so you all see it, it should look very, very, very similar like to a reformatted Drushmake file. Same information, compatibility with what version of Node, the dependencies you can point out, a version number, a URL of a tarball, basically the same kind of stuff you have in Drushmake and just like you love Drushmake, you will love NPM. Okay, so package management I think is like one of the very interesting things that late in my career with Drupal we actually really got right and in a very useful way and it's something that whenever I look at a new system now I'm always looking for like do they have reasonable package management? And with Node.js I found that and in a really good way. Okay, second lesson, fifth point, nice hammer. So there's always kind of a problem with the technology you like a lot, right? And that's that I'm sure Node.js has a good reputation and all but JavaScript really and I'm certainly over excited about the difference between asynchronous and synchronous programming and some of you might think it's just gonna work for everything, right? And yeah, right, this is something you wanna watch out for when you're building something with a system a lot and you really like that system. Now this isn't a problem with Drupal or Node.js but it's something that I try to keep in mind as I'm working with either system here. So Node.js not good for some stuff. Computationally heavy tasks, right? Stuff where you're doing a lot of math, a lot of crypto. This is just not really the kind of stuff that you want Node to do. The reason you don't want Node to do this kind of stuff is because every operation that you do in JavaScript and Node that doesn't take a callback is blocking the event loop, right? Your event loop can do one thing at a time. If you give it some like really crazy computation that's just gonna hog your CPU for a second that means the rest of your server is unresponsive for a second and things are just piling up on your IOQs, right? So computationally heavy tasks and for the very same reason like databases. Databases are like in the Node.js community, the classic example of if you think you're gonna start building a database with Node, please stop, right? Okay, so that said like these sorts of tasks that you don't wanna use Node.js for, what it's really awesome for, what it's really great for, what it was designed for in fact is interacting with other services, right? It wasn't designed as like a big monolithic, like you should run this on big hardware with lots of different, lots of memory and lots of this and lots of that. For the first like couple years that Node.js existed, like V8 had a one gigabyte memory limit that you couldn't do anything about, right? So even if you had like 32 gigs of memory you were stuck using one. Now that's changed a little bit. You can grab more memory in a single Node.js process but the main goal of the project hasn't been, hasn't changed. It's not to build something that grabs a lot of memory and uses your system really intensively. It's something to queue things up to start interacting with other services. So these services can be things like databases, mail services, mail servers, other web services like my example of like tagging services and geocoding services and also like your web clients, right? If you start doing things with programs like Socket.io where you have persistent connections open to your browser for really long amounts of time they start behaving the same way, right? They become these basically services that you can interact with and periodically you can send data one way or the other down your pipe, down the I.O. So because there's other smarter people than me I'm gonna pilfer a couple points from a couple other people's posts about Node.js. First two are from substack.net. Second one from this guys whose blog name I can't pronounce for anything, nelhage.com maybe. Okay, so these are just sort of some interesting general observations about Node.js that I just wanted to sort of leave you guys with. There's sort of an interesting way that people are writing modules for Node.js and Node.js modules are those things that I was requiring into place, right? And the interesting difference from something like Drupal is that the primary focus of most of these modules is on using not extending, right? On using what's given to you, not like extending it in like an object oriented fashion or calling its hooks to do things, right? And so a big part of what makes Node.modules so great is how they tend to have really obvious entry points as a consequence of focusing on usability and limited service area, right? So many, many, many of the Node.js projects have this like require in a single function that does one thing but basically handles all the nitty-gritty around that one thing, right? So this would be like the example like request that I used to read from example.com, right? It has a very small surface area but takes care of like all the problems around making requests to other services. Callback austerity. I would maybe myself rephrase this as callback ubiquity but the quote here is a little bit roundabout. Instead of using an HTTP server being an external service we configure, the HTTP server becomes just another tool in our arsenal. Now the reason this is possible, the reason we can invert the relationship that we have in PHP, the reason we convert that relationship in Node.js is because of how callbacks work and how they're sort of austere and basic, right? You can feed a callback to just about anything that interacts with any service. The function signature is gonna be 70% of the time almost exactly the same. The first thing is gonna be an error that you get an error object that could be null or could have an error in it. Second thing is gonna be the data back from the response and because those callbacks are so austere and so ubiquitous, you can sort of invert a lot of relationships. You can make it a lot easier to prop up a server or make requests in a very convoluted way and you don't actually have to know about how those requests are convoluted or how difficult the system is because there's this sort of social contract that all you'll have to do is give a callback and look for an error on that first argument and look for your data on the second one. And to sort of piggyback off that because everything is async by default and because callbacks are the sort of ubiquitous way of interacting with all of these third party libraries, all like 7,000 of these things in NPM, you can use nearly all of those libraries, right? So like one of the classic sort of problems that you run to in PHP is like if you wanna use someone else's PHP library, chances are to use that within Drupal you need to write code to make that library into something that you can use in Drupal or it's written in such a way that it really doesn't even make sense to write that code, you might as well just recode that functionality as a Drupal module, right? This is really sort of tedious and like prevents something from, like if Pair was wildly inclusive and wildly, wildly successful, it prevents you from using that natively, sort of nakedly a lot of the time. You're always writing these like intermediary things to like manage that like sort of translation for you. What's really interesting about Node.js is that like I found it, I don't run into this. If there's a Node.js library that is well enough written and does the functionality it advertises, I can basically just include that in my program and run with it, right? It just works, right? Which is kind of wild and kind of interesting. So I think you should try using Node.js on a project. If you have particular questions about libraries to recommend or how you would do this or that or how to avoid certain problems, I'll be around, feel free to ask me. I think we have a couple minutes for questions so if you wanna queue up we could do that. But I bet you'll like it. So thanks everybody. So.