 We could be starting, it's one o'clock now, so let's go. So, I'm here to talk about turbocharging your triple syndication with Node.js. That's gonna be a bit of a story from what were our requirements and where we went with them and where we're aiming next. So it's a bit of a case study more, but we'll go into somewhat into details as well. So, what I expect you to already know or what I won't be explaining is you should be understanding something from off PHP and JavaScript as well. Some understanding of trouples internals and some modules. We'll talk about them a bit. And some idea on how especially NoSQL Solar work. I don't know if I'm gonna even mention one if I don't know why I say that in the slide. Anyway, good to know how one works. Anyway, you might need it. Some other thing. What I try to help you learn is how can you use Node.js in real life to accelerate your content delivery. Now, I've seen some Node.js presentations before. I think I was in one in triple con Munich. It was kind of a sidetrack from delivering content from trouples. So it kind of didn't explain anything about delivering content of trouples with Node.js. So we're gonna talk about just that. And then we're gonna talk about why do you need to accelerate your content delivery? We have some specific needs. And I think they should be pretty common, especially in these kind of situations. And then we're gonna briefly touch on how Node.js scales compared to triple. It's comparing Apple's torrenties, but still you'll get the idea on what it's all about. My name is Gala Varysvith. I work for EXO, which is right there on top. And about EXO we can read it on the slide. We're about 70 people based mostly in Finland currently. I've been working with trouples professionally since 2007. And I remember evaluating triple three back in the day before we decided to build our own CMS, which is always a bad decision, don't do that. Especially these days, you shouldn't ever build your own CMS. The project, what we, well, ended up with all this information I'm trying to deliver to you is a service built for Nelonen, which is a Finnish TV broadcaster. It's owned by Sanoma. Sanoma owns a bunch of TV channels in Finland, but they also operate here in Netherlands. They own SBS, I don't know how big or small it is here, but it's apparently been not that good of an investment so far. But anyway, our system is used only in the Finnish operations. We have some talks about whether it could work in SBS as well. The platform is used to run their video service online. Basically that's a catch up, it was originally a catch up service. So they present you ads, basically sell a new set of ads and show their TV content for seven or 14 days catch up online, like video on demand. And now later on they added the subscription, what they call SVOD, which is subscription video on demand. That means that you pay to get a subscription and then you can, usually for a way longer period, the contracts are usually two or three years. So you can look at the content whenever it's more like Netflix in that way. They also stream live sports and essentially what they stream is the Finnish Ice Hockey League, which in Finland is probably able to soccer here in the Netherlands. Finns really like their sport. It's been watched quite a lot. And it's quite expensive. I would say I'm not a fan of ice hockey for some reason, I guess I'm kind of a bad fan in that way, but I wouldn't pay what people have to pay to watch it. Well, then again, I don't have to pay for watching. All right, the architecture. All right, we have a video content management system that's a triple seven. There's a linear TV programming, which is, they call it linear TV. That's the TV you have home that you can turn a channel on. The linear TV programming, as well as uploaded content come are fed to the system. So basically, linear TV works in a way that it's coming from their ERP system to the content management. And what it does for that content is it delivers it downstream, where there's multiple clients, their websites, their basically affiliate sites and all that. Sonoma is a big corporation, so they have a lot of newspapers and magazines and stuff, and they all want to use videos. So they use this platform mostly for that. They are iOS clients. There's smart TV apps and stuff like that on the downstream. Of videos, the system only handles metadata. So when we get an uploaded video, you can upload videos there. Most of the videos don't come uploaded. They come from an automatic system, which is their video production system. We handle the metadata, obviously. So we don't handle video files in Drupal. We just, you upload it and then we pass it on. We don't touch it anymore. It's going to be a video binary handling system from there, which is also built by us, but it's not Drupal. The CMS handles all the metadata, which is where it shows, whether it's within a product, connected to a product, who owns the video, what's the description, what's the images that it has, the stop image, that kind of stuff. And the rights. The rights are obviously very important. So the privilege is that how long you can watch it and where and stuff like that. It also handles resolutions and bit rates and stuff like that. Videos are streamed from multiple streaming locations. As directed by the CMS, they currently stream from Nelona's own video streaming streamer. Basically, they have a really, well, they have a couple of servers that can stream content, not that good these days, I think. And then all the live content and the high really popular content are coming from Akamai because they only can stream, I think, 20 gigabits per second. And last Friday, they were running at 32. So you can't do that with 20 as a limit. So you have to stream it from Akamai. Especially the live ice hockey. The finals, for instance, are hugely popular. So this is how it originally kind of looked. There's what we were thinking that there's a video content management system here and then we just stream content to the two sites where they're at that point. The triple in the middle is focusing solely on content management. So it's not delivering any HTML. It's delivering JSON feeds out to the downstream. And it's, well, headless triple if you want. That's been a buzzword lately, I think. It, well, you only use the admin side, that's okay. There are custom modules for integrating to the TV RP system, which is, I think, built in 1997. So I'm just not gonna comment anything on the system, but we get XML from that system. And it gives us linear TV programming like three, four months forward. So we might get quite a lot of broadcasts and TV dates and stuff. Then the triple also uses a custom module for controlling the video binary management system, which controls the actual video files. So triple tells them on which name, on which streamer, and why, and blah, blah, blah. And the binary management system does what it's told. And then what triple does is it makes the videos ready. It checks that they actually are on the streamer and then creates a node to mark the video to be able to be played. So it's kind of a ready media node. So that's the signal that you can play the video. It's on the streamer. Especially for news clips and stuff going through, that's essential so that when it's finally went through conversion and went to the streamer, then we know that you can play it. And so, yeah, we had this and everything went great. And then we got some more downstream clients and then some more and then a couple more. And when we realized it, no, we really can't serve this all with triple. So it's not like we didn't think of this at all when we started. We built the triple seven on MongoDB field storage. And it was standing on a fast database in a sense because MongoDB field storage stores the field information into MongoDB, as you know. And MongoDB, because of its nature, can store them in a kind of a more sensible way than MySQL. So it's faster with that. And also the views, the fees that we were running, they were done with views and an adjacent output module. And they were coming from a solar back end so that it won't burn in the database. About the field storage, though, I have to say that it's faster. And it's not really compatible with views unless you use it with EFQs. And EFQs, although they work well with MySQL, they don't work that well with MongoDB. I'd say that it's, we moved away from MongoDB field storage at some point. I would say that unless you really know what you need to do with views and there's no kind of sudden urge of creating new views, then you might be able to pull it off with MongoDB field storage, but we didn't, so I can't recommend it. And there were like some serious bugs in this one, because the Booleans didn't work at all. And then there were all sorts of different trouble between the modules that go to MongoDB from EFQ views, EFQs, MongoDB and field storage and so on. Though it's a chain of modules and then you have to be jumping between EFQs and Drupal.org. And when we moved it, the database back to MySQL, the MySQL almost crashed immediately. We had to move it on SSD disk, so it's actually, it did help quite a lot on the performance. But even though it did help, big feeds coming from Drupal always, it's always kind of a problem. The storage doesn't make a difference at some point. It's not about the storage. The storage isn't the problem. So you get to the point where the field API is just too extensible to be fast, because you'd have to cache it. We'll talk about that a bit later. But if you run the field API, it's gonna run all the hooks. So it has fetch hooks, it has load hooks, it has view hooks and stuff. It's gonna run all the hooks for all the fields that you load. And when you have 40 fields per item and 1,000 items per list, you're gonna be calling it quite a lot, all the hooks. And when you get to that point, it's gonna get really slow. You just can't get around it. And at this point, I think I know what you're thinking. Hey, you know, just, we all have heard of this. Let's just put varnish on top of it, get it over with, we can just cache it. Well, that's kind of hiding your problem there, but even though you could do that, the downstream clients, they have specific needs for the integration feeds. When you're downloading some content from somewhere, you only want the content you need to import. Otherwise you have a lot of scrap content that you just need to go through that takes quite a lot of time, especially when you have a Drupal downstream as well. So you want to have only the things that have changed since your last fetch. And basically what we needed to do is allow the downstream clients to get the feed by the seconds of last import. So whatever seconds they gave, we gave everything that's newer than that. And then the downstream clients were happy. Now them using seconds in the URL and we changing the page based on those seconds makes it really hard to cache because it changes kind of like every second. So we couldn't cache it well. So we knew that we had to come up with some other idea of how can we serve the downstream clients well and still be able to cope with all the traffic coming in. So we decided to index outside of Drupal. First of all, we did have some solar indexing already in place. So we used solar for the views backend by search API. At that point I think they were just combining their shimas. But Apache solar integration, which enables us to index all content to external solar server, seem to fit the bill well. So we indexed the stuff to solar and then we decided to distribute it out with the simple rest API. But there was a certain small problem because for some reason, and when you think about it, it's kind of an obvious thing. The customer wanted to list the video feeds or to list the videos, especially to the clients live by popularity, obviously. So their videos, the news videos went viral and they wanted us to make sure that we always show the most popular video first. That means that we have to update so that we can order in solar, we have to update the documents constantly. Now, if you know anything about solar is that solar really doesn't index fast. It doesn't do that, that's not its thing. It indexes slow and it searches fast. That's how it does things. So when you get to the point when you have to add some feel there, it's indexed. So we tried different approaches with, we were looking for solutions of two core, like sorting by other core of, and we would just index the other core of solar and stuff like that and then adding without indexing a feel that wouldn't make it index everything else. And well, nothing worked. So eventually we just decided that we're gonna go with MongoDB because it has a similar document or even more flexible document storage mechanism because it's a no skill server and you can index very fast. Well, at least even if you have a lot of keys, MongoDB will index so that it's not a problem even though you index pretty much constantly. Now, we get to the point where I'm gonna say that 2.6 has a problem because it locks the whole collection and not just a single document. And that's due to the whole updating constantly thing but that's gonna go away in 2.8, I hope. But it still was way better than solar in that. So then we were here, we had this, MongoDB, there's our REST API delivering content outside. This part here is done by a Drupal module. We use to index content straight to the MongoDB. Now it uses a straight connection to the MongoDB for performance reasons. We don't use an API to do that. Even though we have been pondering on that and it kind of would make sense in architectural sense, it would look quite a bit better if it would use an API but currently it connects straight to the MongoDB and indexes all Drupal content. It works similarly as there's Apache Solar Search integration but it has, it's slightly a bit, well it's a bit different, obviously it indexes into MongoDB and it tries to do it fast and without errors. We've had our share of problems with that indexing as well. It also denormalizes the data for optimized distribution. So when we have a layer of different content types that all contribute to the video, which is the end user sees just a single video, they don't care about the episode and the connection to the season and stuff. They just want to see which season, which episode and then the video. So we denormalize the data so that it's optimal to deliver from the MongoDB out. And this is a content module but it's currently in a sandbox, it's waiting for approval because the guys decided that it's a good idea for the guy who doesn't have create project rights in Triple the Dog to do it but well, it's their call. I guess it's gonna be in the queue for a while. Then we get to the delivery and distribution which is the beef here. We have our data optimized for delivery in the MongoDB and we want to deliver it out fast but we also want some kind of a handle to touch it while we deliver it out so that we can see, we can handle the timestamps, we can search, we can use different sorting, different content owners, different hidden tags and stuff like that. And here comes SnowJS to the rescue. SnowJS and here's a kind of brief introduction to SnowJS. That's basically JavaScript running on the server. There's nothing kind of more magic to it. It's just JavaScript on the server. It's run by Chrome's JavaScript engine, the V8 engine and it's a non-blocking event based as the JavaScript is and when you use it correctly, it's blazing fast and when I say blazing fast, I just mean that it's about eight times as fast to run as PHP code on the server. And that helps to make that and this non-blocking thing so you can do things in parallel. It makes it really fast for certain things. I'm not sure how good it is with really complicated things like running Drupal. I think PHP is probably just fine for that but it makes quite a bit of a difference in performance. And here I have this scientific calculations of the request per second. If you want to really know request per second with Drupal, dropped under one at some point even though we had some proper hardware running it and with the Node.js back in request per second run from 800 to 3000 requests per second that we can deliver with the hardware we have. And now if you want to get that kind of numbers from Drupal, you need to have quite a lot of servers or some caching which means that you're not delivering it from Drupal. This is coming straight from Node.js and the response times in Drupal, they went over a minute at some point to get that list out. And in the Node.js back end there from 80 milliseconds to maybe 150 milliseconds, something on the lines. So there's the difference. And now when we're running Node.js we didn't use originally any Node.js framework. We started with just a really simple implementation. I would say now, now thinking about it I would say that we could have probably used a framework. It wouldn't have hindered our speed or something like that that much. And because currently MongoDB is the bottleneck. So when we get to the limits of the platform's performance it's not within the Node.js, it's within the MongoDB which can't handle the query numbers at some point. Also, we don't use a fronting Nginx. Request are go, they go straight from an F5 to the servers and they get served by Node.js itself listening to the port and it handles them by process signaling. So it uses sub-programs which are separate so that we don't kill any other programs if we fail with something. But it could be, I think again, we probably at some point we were gonna put Nginx in front. I'm not sure how much that's gonna slow it down. We were supposed to test it last week but didn't have time. If you have any experience in how much slower Node.js is with Nginx up front I would be really interested in hearing that. And it's running on currently, the setup is running on three nodes. They share a MongoDB replica set across those three nodes and it's from the Bion F5 load balancer. The three nodes are all pretty sturdy. They, I think they have eight CPUs and 32 gigs of RAM. The code itself is very simple. It's from very simple to pretty simple. The different back ends, they're all separate services as set in Node.js. And they just mostly what they do is pass content out from the MongoDB. That's what we do. We have the content optimized for delivery in the MongoDB. So we just pass it out as fast as we can. There are filters that we validate obviously. So if you use some filters we validate them. We also can do some other processing. I'll mention that in, well yeah, I think I don't have the slide for that. So it does other stuff like, it can validate security tokens. And it validates your session, it validates your subscription and it can deliver a security token for viewing protected streams like the Ice Hockey. So that means that we can run on top of the Node.js platform, we can run iOS clients, the Samsung TV client, Android clients, Windows phone clients without anything else. There's no other backend for them. They just list the videos from there and they can log into the Node.js. There's a single sign-on server behind it but the Node.js can deliver the credentials there and give you a session. And it also stores the user rights and the product information locally. So that the user subscription information is coming from the sales platform which is a different system altogether. It's something horribly far away and horribly Java-ish and I haven't seen it. I don't want to touch it. And then we deliver I think Ice Hockey live statistics for all the clients that currently viewing the stream. You have the video there and you have then the statistics under it. We deliver the statistics, obviously. And then we handle extensive logging for the whole platform which is basically that we have a single place where we log because the videos go through a number of systems before they arrive at their destination. So we want to log them all in one place so we can see from the video when you go to the Drupal and you can open the log. The log comes from outside Drupal but you can see from the log where the video was and where it's now. Currently the bottleneck is MongoDB as said. So it's doing really well, the Node.js here. When if we get MongoDB to be like super fast at some point we might hit the next limit that we might hit would be the TCP connections on Linux. We actually optimize them quite a bit at some point because when you try to deliver 3,500 requests per second or like 3,000 requests per second you are bound to have quite a lot of TCP connections open at the same time. So you have to optimize those socket numbers and stuff like that in Linux. Linux won't by default serve that many. There's just a glimpse of the API documentation we have. As you can see there are different filters and order by and you can limit the times and then there's some versions popularity from version two and stuff like that. We're trying to serve the customers which are the users of the API as well as we can so that we try to keep the documentation up to date at all times. For the Node.js we're using the cluster NPM module. The ones of you have used Node.js you know that by default the main program will only hang around in one core so if you have multiple cores you need the cluster so that you can start it start running in every core. And then we use forever to keep it running even though it might crash if it has some bad input from somewhere which is usually the SM Liga the Ice Hockey League statistics server that used to give really crappy XMLs to it. And if we weren't careful with them it might actually crash the backend. But for the generic Node.js for PHP programmer it's quite a change. I would say that it might look like well I've been doing JavaScript for quite a long time so it's okay I can program JavaScript but it's not quite the same thing when you're doing it in the backend and you have to do quite a lot of things with it. There are not, there's just in front end it's mostly like events happening and you're doing some DOM alterations based on that or Ajaxing something from the backend but in Node.js you really have to do parallel programming so all the function calls all the calls from different systems they are running asynchronously. So you have to handle that somehow you have to understand that they return when they return and when they return you have to handle it whatever they return at that point and you're not in that part of the code at that point you're already running something else. So there are NPM modules to help it like promises I think that makes it quite clean to handle the asynchronous function calls but other than that I would say that even though it might be a bit hard for a PHP programmer it's also pretty eye-opening on what you can actually do with programming languages especially if you've been in PHP for a long time. During the lifetime of the project which it's been live for like 18 months now we've been building it longer than that like two and a half years and the previous system as well before that. Both Node.js and MongoDB they have evolved quite a bit. I would say that MongoDB maybe even more than Node.js. MongoDB has added they have added quite a lot of new features that we all all have liked quite a lot like connecting or using two array indexes at the same time for a query which helps quite a lot if you have like you have to combine two tags and then search based on that and have an index because obviously you want to have an index and as I mentioned earlier MongoDB 2.6 is out now but 2.8 is rumored or announced to have document level locking so we would much like to see what that will do for our locking times because currently the locking times are quite high especially with high load because we update a lot of content constantly due to the popularity and also due to the actual thing actual thing that stuff changes quite a lot constantly. What we have planned for next we have some stuff cooking up here. We're thinking of separating different Node.js services fully so that we might be able to move them to different servers at some point. There's a bit of a there's like highly critical services and then there are certain services that don't have to be served within milliseconds because seconds is fine. So for instance currently they released a Windows phone client which probably here doesn't seem like it would change anything because who uses Windows phones anyway but in Finland for some reason we had this thing called Nokia previously Microsoft bought it, the phone services but friends for some reason still use quite a bit of Nokia phones which are our Windows phones, Windows phone phones so that means that we get a lot of traffic when we have a working client for that so there's a lot of traffic coming from the different mobile applications. We have one backbone.js front end and we have different servers and new consumers of our API are coming constantly from within their cooperation. So we might want to be able to even scale more so moving the Node.js services out, differentiating them so that we can have them on different servers. That's good. Also moving actual MongoDBs out from the Node.js servers would be good and especially if we get them to something that's really like fast with IO like max IOPS thing is that sounds promising. I might want to mention that there's a front end triple I mentioned earlier. There's a triple seven front end for the actual site that's used by browsers that's not built by us it's actually built by Wunderkraut. I didn't get paid for mentioning their name by the way. And that runs during rush hours with 30 servers. So we run with three on the background so that's kind of the difference and they have Varnish on top. Actually a couple of Varnish, I think at least one Varnish server which catches all the content so that it even doesn't get that much back in traffic. They can handle I think about a hundred logins per second when people are logging in to there. And that's, I mean it's not there for this just triple isn't that good with high performance stuff. Then there's some triple optimizations that we're doing. We're moving the integrations outside triple for clarity. So instead of modules we would like to use an API with kind of the custom video integration stuff outside triple and then just use the API. So we created the module which I find odd that there wasn't one already. That fakes the triple eight rest API on seven. It's kind of handy if you maybe want to move to eight at some point. We also released that but that was even released by some other guy who also didn't have project creation rights in a triple to dark. So that's also in sandbox wedding approval. Let's see when that comes out. And that's pretty much it. That's my presentation. Thank you. Now if you have questions please use the mic there. Apparently they're recording the whole thing. That doesn't like there. See try to walk there and then ask. Can you hear me? Yeah. Okay. Can you talk a bit about the MongoDB indexer module that you wrote up? I think conceptually the idea is that content would be created on using the standard Drupal GUI and then it updates the MongoDB. Yeah it works similarly if you have used. If you have used the solar indexing, solar services for instance like Aperture solar search integration module. It just stores the same content you have stored in Drupal outside whenever you change it. The MongoDB works in similar ways. So it doesn't touch Drupal's story mechanism. It just hooks it and then stores the stuff outside of Drupal for easier access. It doesn't, it's not that spectacular module at all. It just stores stuff outside Drupal into MongoDB. It's just a replica of GUI Drupal's content but it always stores it in a slightly different way so. Because it. Access quickly by Drupal. Yeah. Yeah. Thank you. Thank you. You mentioned in the start that you were trying to serve content based on popularity. If all the clients connected in Node.js, do they also write back to the MongoDB or do they provide that data back to the Drupal instance? Actually there is a backend for popularity that counts the videos and it goes to a different, there's a separate service that handles that data. It goes outside to a statistic server that handles the statistics that they use for selling ads to the content. But they have, I think currently, they were listing their tracking mechanisms. They had eight. I think they've added now one more. I don't know why but they have, I think nine tracking systems that track people using videos. But one of them is within the Node.js platform. So yeah, we do write the video counts there and we use that information to change the popularity, yeah. Thanks. Hello. Hi. I was just wondering, do you use any test framework for the Node.js backend? And if you do, what framework? Test framework for what? For integration testing or unit testing or APL? For unit testing, we don't use anything currently. The code base is pretty simple and we could use it. Not a bad idea at all. But the system is running in three different environments. So there's a dev environment, there's the staging environment and then there's production environment. We have the staging and the production environments are constantly tested by a bunch of Python scripts that some of them are just Python scripts and some of them run Selenium all run by Jenkins server. So like integration testing is done by that. And basically what it does is it tests that the system works in staging and in production it just tries to see that nothing has stopped from it from currently running. So if there's a let's say a pause in new videos that's eight hours, we know that we need to alarm people because there's never a pause of eight hours in new videos even during Christmas. All right, thanks. We're obviously powering quite a big site with a lot of traffic. Would it make sense to use Node.js to speed up smaller sites? Well, it depends on the requirements obviously. So I would, it depends on how small and how can you serve it with Drupal. It's extra hassle if you have content in Drupal and you can serve it with Drupal, let's say headless Drupal to the front end. I would say that why not use just Drupal for it if that's okay. Obviously the average response times are in Drupal even though there would be no traffic, they're still pretty high. So if you fetch a lot of content, it's gonna take a while. So that might be help with Node.js. But I think this is something that needs to be used. I think, well, I see that this is something that I would use in situations where I'd really definitely need the performance of the back end. Thanks. Why didn't you go look for something that's more suited for your needs than Drupal? Because it seems to me that you could actually just find a content management system that might do stuff on MongoDB in the first place so you didn't have to do the conversion to MongoDB or just use Symphony instead because the- Symphony, well, well, Symphony isn't the solution here. I mean, Symphony isn't, if you use Symphony, I think properly, I think it kind of likes to have an SQL database behind it to fully use its capabilities and Symphony is, I'd say, slow as well. Anyway, you can't deliver that. You can't hunt them out with that. So you still have to have the Node.js there. So building a CMS on Node.js because there are no proper CMSs in Node.js. If you've seen Ghost, which is the best CMS in Node.js, that's like WordPress one in functionality. It's like, it's so far away from Drupal these days. So it's another thing. And besides what I kind of didn't focus on this presentation is that obviously the video content management system, it serves a lot of journalists and it serves a lot of content owners. It serves a lot of production companies. They need to have user groups, detailed user rights. They need to be able to configure the groups and user rights and everything in detail. And Drupal offers that out of the box. So you don't really use it just as a headless thing and you actually use the content management system? Yeah, well, I don't use it for serving content, but I use the content management system itself extensively behind what you can see. So if you use the iOS client, you can't see anything but in the background, the Drupal is used quite a lot. Thank you, that makes one more sense now. Anyone else? Okay, that's it then. Thank you.