 This is the scalability session. I've only put the slides together this morning. It's about 6 AM. I was ready, so sorry if there are any errors or I stumble a bit on my words and stuff. I didn't have much time to practice. Yeah, well, scalability, that should say it, I think. The agenda, I want to give a brief introduction mainly about me, because the session is a joint one with Frank, but I think most of you know Frank, but not me. Then I want to talk about the documentation that NexCloud has, and then some large-scale deployments from existing customers. And last, a concept design. And then the point of this presentation actually is that Frank will answer my questions, because it was supposed to be a lightning session for five minutes, which I thought, well, that's a great place to ask all my questions, because I want to do a large implementation. But they bumped me up to this session, so hopefully I'll still get those answers if he doesn't stay too long on the hallway. After that, you can ask your questions, and hopefully Frank will answer them too. I'm Dennis Pennings. I'm from 368 ICT, so I'm not from NexCloud. It's a small company in the Netherlands, about 20 employees. Mainly system operators. I'm a system operator as well, so no developers, which is a little bit strange in this kind of crowd, I think. But we did a small proof concept, and we decided we want to use NexCloud for our own large-scale implementation. It's supposed to be for 10,000 users, but there's an option to go to 20,000 users. And as I said, my main goal is to get the answers on all the questions I have for this scalability design, because I'm no expert. I just have a lot of questions while we were figuring out how to build an implementation for 10,000 users. Also, the point of this presentation is if I get all the answers I want, I'll update the documentation with all the information I will get. After we tried NexCloud, we were really excited about the product, and we wrote a business case to deploy it to a lot of users. And basically, we started with a small implementation. That's this one. It's a small workgroup. We have about 15, 20 users on it or so. We just tested all the functionality. It's a really basic setup. All the roles are on one system, one server. You could connect an LDAP server. That's not that's the basics. I think that's pretty solid, I think. This is used a lot. If you go to the next step, that's a midsize. It's a little bit bigger, 1,000 users, 200 terabytes. And you can also see in the picture that it's a lot of more servers. In the picture, it has two web servers. It's got a storage server, a picture that's the NFS server. It's tied to an LDAP server for identity, and it's got two database servers. And also, not to forget, a low balance on front, and actually a software low balancer. I think most of you know it, right? This is actually the picture that I saw. And then I read a little bit further on. And then I read database, MySQL, MariaDB, MariaDB, Galera cluster. So that's not what I think this is. This is the old database picture. So I think this picture needs to be updated first. And there's always missing an LDAP slave that should be here and here, I think. So this picture needs to be updated a little bit. Then the large implementations. The documentation says 100,000 users up to one petabyte. And if you look at the picture, it's a little bit cluttered. Sorry. I had a little trouble getting everything in the slide. But you can see it instantly. It's got lots more servers. It actually has the minimum amount of web service that the recommendation gives, because it says here four to 20 application servers. I think there are enough lines with just four application servers. But that could be expanded to web servers from four to 20, or maybe more. The recommendations say 20. You see also low balancers on front, hardware low balancers in this case, F5, a big IP, I think. Also the NFS server, LDAP master, and the same database servers, but more of them. Also, I don't think that's the right picture, because also here in the documentation says Galera cluster. So that's a different picture. But you can also see that there's more database servers here. And that's what it says here, four database servers, quite big. And also some reader servers added on. So this is what the documentation says. This is what we looked at at front. I was a little bit surprised to see that kind of schema, because you'll see further on lots of customers that only use Galera. But I didn't read the documentation on further, because I thought I saw it in the picture. So that's a little bit misleading, I think. So I think we should update that. But this is the main place that I went to at first to see what I should do to create a large-scale implementation. Then I found some other notes in that same documentation. You should use LDAP slaves on the web servers. I already showed you that in the picture, that it should be on the web server, should be a local LDAP slave for performance. You should use SSL offloading on the load balancers. I think that's added in. Yeah? Can I ask your experience to changing the software a lot of the rest of the environment, or does the hardware pick an attitude to that? I think yes, because we actually use hardware load balancers. But all the large customers all use software. But the main advantage of hardware load balancers, I think, my personal opinion, is there is dedicated hardware in the hardware load balancer to do the SSL offloading. So you would offload the SSL work to the load balancer, so you don't have to have that much web servers, because you take a little bit load from the web servers. And because there is a lot of SSL traffic going on, it could be a major step. I don't know what kind of scale, if you can have the amount of web servers. I don't know. We should test that. That's a good idea. My experience I tested was this hardware offloading with many, many compute classes just unpacking the server to see what is the load that is offloading. It was made also offloading with the SSL communication for back-and-down servers. And it wasn't sort of loaded, so it did not mean to end it. Yeah, I think it's ace. I think for the SSL offloading. There's a chance of sleeping with one-on, when you look at it, it's not using so much as this. I think you need ASAC processors that are specific for SSL offloading. They're not in general server hardware. Sorry? Sorry. Sorry. Sorry. Sorry. As I said, I'm no expert. I've got a personal opinion. I could find out. I could find out some more. Do you have a question? He says exactly the same thing. There are a few companies who really need hardware. But the rest of you can do behind-the-scenes problems. And he sells hardware. It was too bright for the communication for non-experts. You require hardware a lot, but I spent... Yeah. Well, as I said, I'm not... One moment, please. As I said, I'm no expert. I'm just mentioning what the documentation says. I'll show you... In the next part, I'll show the large customers and how they do it. I'll say now that they don't use hardware low-balances. Sorry? I may have missed something, but are you using... You mentioned those LDA and a Cipollis. Are you using both? Um... It's possible. Usually, Cipollis would be used instead of LDA. Yeah. So there are scenarios where you can't do both... You can't do both. Yeah. There are scenarios where you can't do both. You can't do both. Yeah. You can't do both. There are scenarios where it makes sense, for example. Cipollis doesn't provide... So there are scenarios where you want to get a group back from the other and actually have a long interaction with... It makes life complicated. It's only complicated. I have some questions for that, too, on my last slide. I used Redis for session management, memory caching, and file-locking. We've actually had some problems with file-locking in just one server setup with Web Dev connections. We're actually still on own cloud, because there was a little bug in OS X and Web Dev connection, but I think it's fixed now. So we'll be going to next cloud one of these days. I don't know how really curious if this thing is solved. And the documentation also says use memcached if you use Cipollis. So, up to the next chapter. After I went through the documentation, I went to search some more online on what I could find. And then I found... I was referenced to CS3 by Jacob from CERN. Is he in the room right now, by the way? Well, if you see this online, Jacob, thank you. It took me two days to watch all the sessions, but I watched them and all the information was inside. I put in a large sheet and I'm going to present it right now. CS3 is a conference as well for customers who have large implementations for cloud and sync applications. Most of them, a lot of them use own cloud. And this conference is once a year. If you have a large implementation or if you want a large implementation, I definitely advise to go there. The sessions that were in the past from 2016 are on the top URL. And the next meet is in the Netherlands, I think. The website was down last night, I think. But two days ago, I saw a date mentioned on 30th of January. It was not down, but it was out of maintenance, in maintenance. I think somebody published some information too soon. Okay. Well, I was just going to tell about the question mark. But just check out the URL. I think that will have the most current information about it, about the conference. That's also the next point. All my information I gathered is from January 2016. So that's nine months ago. I've mailed a lot of those presenters for updated data and some extra questions. A few of them have already answered. I will keep gathering this information and keep publishing as I get it. I also have some appointments with some large customers to talk about their issues. Because I want to know their issues before we start our deployment. But it's a little bit out of date. I think most of it will be updated on 30th of January, I hope. Next. This is basically the information I have gathered. There are all different sessions, so I don't have all the data for all specs. But it shows some interesting things. Actually, I had gathered information, but I found out I needed lots and lots more. So that's what I'm going to work about. But let's see what you can find out about the information I did gather. On the left, you see the recommended design from NextCloud. It scales from five to 100,000 users. If you look at the large customers, the largest customer has 25,000 users working. They say they have a potential for 500,000 users. This was all in January, so I don't know what the count now is. My core is a good example for that. They had in January 4,000 users and they wanted to go to 15,000 users on the end of the year. But all the data is from January. I haven't seen any large implementations of larger than 25,000 users. Maybe you have. Have you seen large implementations larger than 25,000 users? Concurrent of unique users. How large? Big installation is close to 2 million users in India. Wow. That's interesting. I would like to add them. Documentation, the recommended design. It says four to 20 users. On this table, it says 12 users. In the documentation itself, it says 20 web servers. You can see that the customer base has somewhere between that. Minimum of two, maximum of 20. But you can also tie that to the hardware for the number of cores and memory to the servers. Obviously, they have... Sorry, yeah? This table, will you start from here? Sorry? Will you start from here? Is this data you've collected? Sorry? The data you're presenting. Did you just come from CS3 or did you collect it yourself? I've collected it by watching CS3 presenters. Sorry? It's interesting. Yes, that's what took all my time last night. The first column is the recommended design from NextCloud. That's on the documentation website. That's what I showed before. It's like this. It's on the left side, the docs.nextcloud.com. Yeah, there are some large web servers in between from SIBO, 16 cores per node and 128 of gigs. I think they're oversubscribed. So I don't think they have a really high load if I look at the different specs. But this design, it's how many servers you would build up front. I think you would start somewhere and then just look how things are progressing, how your user spaces are working, how many load that actually is on your web servers and then scale extra at more RAM, at more web servers. I think if you look at the recommendation, the amount of nodes and the amount of hardware, it's all a little bit in line with the number of users that are on top. So the thing is that there's something here that some of those approaches have, they have a load in their head. So it would be more interesting to see what the actual load is on these systems. So that's what I've asked, but I don't have that data yet. Yeah, the recommendation looks in line with what the customers are doing, I think. The database, the recommendation says, recommends MySQL MariaDB with Galera, but they also support Oracle and Postgres. I'm not sure if that's still the case. Yeah, okay. But it's actually moot because everyone uses MySQL and MariaDB, all in a Galera server, I think. There's one exception that's CERN, but that's a big exception, I think. I'm not going too deeply on it because... But I think they made a fork of the software and they had removed the reliability on the SQL survey on itself. They used their storage system as a SQL database. I'm not saying that correctly, I think, but... Yeah. I think because we are the client-based entity, I can't see how this is going to be. No, I think... Yeah, I think it's a very common use that our customers can't go directly to the storage. I don't think that they will store anything just like sharing... Yeah, I think there are categories on their own. But it's interesting to see that they don't have that much users and not that much amount of data. They're not the biggest. But they do have all the native on-cloud clients. So on the user end, there's nothing changed. Also here, you can see on the database side a four-note for socket 128 gigabytes of design specification. I have deleted the number of nodes. Sorry, but it should be four. Most of the clients all have four nodes. There are a few that have three. Three is the minimum for a Galera cluster. But the specifications say four. Most customers are on four nodes. There are some smaller servers in between from MyCore. Xebo has big ones and serve as well. I know from serve that they started out with less memory, but they had some issues with the load on their server, so they upgraded it to 265. And that's quite interesting if you see that they're only at 13,000 users. So I hope to get some more information about that. By the way, that's a customer who has a dedicated environment just for on-cloud. Storage, yeah. That's a bit of a gray area. Of course, what should NextCloud say about storage apart from that they support it? I'm not sure. That's one of my questions on the end. But you see that there are a lot of options. NFS, S3, self-cluster FS, GPFS, and Swift. And the customers use a lot of cluster FS, which I think is an onsite solution. And a lot of large file storage solutions like self and scalability. As I said, EOS is a flavor of its own, although there are two customers who are actually looking into it to use EOS as well. That's a cloud store and a surf mapping. They are interested in the EOS option. Yeah, but this is also very hard to compare because you could actually only compare if the storage is used exclusively just for NextCloud, I think, because a lot of these large customers have storage solutions for all their services. There are a few universities within here and a lot of research centers, and they use their storage for all kinds of things. And it's the same with CERN. They have this really big storage system, but there's only 1.3 petabyte dedicated for own cloud. So I think that's one of my main issues where I ran into what should I set up for kind of storage. Yeah, I've got some information about that. This is actually one quarter of the sheet I have. I will publish the sheet as well. And it's got more info, but I couldn't get it all on one slide. But I'm going to walk through a little bit of interesting things. You can see the number of files here. You see, they all use PHP 5.5. They all use own cloud 8. That's because I think in January 8.2 was the current version, and version 9 only came out in April, I think. So nobody says something, so I'm guessing it's right. So that's a thing because the date is nine months old. So I'm curious to see if one of these customers is actually upgraded to 9. Something else interesting, yeah, storage. The design specifications from next cloud say use SSD storage in your SQL service, and the customers who have mentioned what kind of storage they use in their SQL service all use SSD. There's one customer who used Ceph as the main storage for their SQL service, but they ran into problems with 4,000 users, and they moved to dedicated SSD storage on their SQL nodes. I'm not sure which one it is. It's somewhere. Network, that's the thing we talked about. Design specifications says hardware, low balancer, but if you look at the customers, they all use, they have used H proxy, and they all went to Max scale, most of them, so that it's all software based. Some other interesting stuff. Yeah, surf drive is looking into other storage systems. They have some problems with cluster FS. One of the issues is that if they add another cluster FS, it takes two months to rebalance the data over all the nodes. They have about nine storage servers, so about 10 terabytes per storage server, a little bit more, and they have backup issues. I don't know the details. I hope to find out, because I have an appointment with Ron who manages the surf drive, and I'm very curious to see... Sorry? SETFX? Cluster FS. They had problems with cluster FS. On the gut, huh? Yeah, that's a surf drive. But on the other hand, they're still running on cluster FS on itself, so they might not be really satisfied with the solution, but it works. They're looking into a GPFS, EOS and Dcash as different storage solutions. Docker. I was very happy to see there are two customers who use Docker, so that was one of my other questions. Are there large-scale implementations that use Docker? There are actually two, CloudStore in Australia. They're really happy with Docker. I read... I saw the presentation. I mentioned it twice in two slides. There are actually words where we should have done this 18 months ago, so that was really nice to hear. But on the other hand, there's another customer, that switch drive who told me yesterday that they're using Docker as well, but they're not that happy with Docker. I hope I have an email from him in my inbox that says why they're not happy with Docker, because I'm very curious. Other stuff. My core has an issue with the version app, they say. It causes too much load. Oh yeah, switch drive was the one who had problems with the storage on the... when they put the storage from the SQL service on the SEPH storage, and they moved to SSD. They also had one volume, a 100 terabyte volume, lazy zeroed, and they moved that to two terabytes, and after that they could go on beyond 4,000 users. Oh yeah, the OS is interesting. I was... Sorry, yeah? Object-based storage? Yeah. Yeah, it's the Uni-P. They used SEPH, and they used the Gradles and Keystone for Swift access. But they didn't mention that much about it. But it's the same question we have. Should we go for object-based storage or should we go? Yeah. Yeah, that's what we were thinking. I think there's a connection to do you want one side or multiple sides as well. I think that's something... Next slide should say something about it, but that's one of my questions on the end. So he can answer that. Yeah, so if I put that documentation together... Yeah, sorry? Well, maybe that's the reason that it's the recommendation from NexCloud, but it's not a hard recommendation, but I think that mostly because of the enterprise support you can get. That's what I think. And I guess that's also the main question. The documentation? Well, the documentation isn't really clear on should you use Red Hat Enterprise Linux because in the further on in documentation it says they support all major distributions that have enterprise support and an easy way to update their OS. So it's in the... If we go back here... On these docs where they show the system requirements, there's Red Hat Enterprise Linux named exclusively, but further on in the text there are also a lot of others mentioned. So that's one of my other questions. So I think we'll come back to that. One other thing that I was really happy about was this one, and this should be a good sign for tomorrow as well. They use Ubuntu, and that's what we use. So I'm really happy to see a customer of a large size use Ubuntu. We've seen Switch Drive, 20,000 users. That's the maximum we want to go, so that looks good for us. Concept design, yeah. So if I put all that together, the documentation, the website, all the stories from the large customers I think I'll come to this. We'll start at designing for 10,000 users, because that's certain. The 20,000 users isn't certain, so we just want to start an implementation and scalable design that can scale to 10,000 users, and after that we should scale beyond that, but that's not taken into account right now. But if I'm honest, I think the design could scale no problem to 20,000 users, as we saw. We'll probably start out with four web servers as VMs, low hardware, low memory, and just see what the load are on these servers. If they have network load, if they have CPU load, memory load, and then do something about the load we're seeing. So it could be that if you have a memory problem, we just stay on four VMs, but have a lot of memory per VM, or a network problem, and then we'll scale out to more VMs, or put more network cards in it. It could be an option as well. Everyone uses Apache, I think. There's one customer who uses EngineX, I think, but I don't have that much data, but most of them use Apache. I saw also some documentation that EngineX doesn't have all the features that are in PHP. Am I correct? Well, the recommendation says Apache. The large customers are on Apache, so they're going to go Apache as well. PSPay version 7, I think, if I look at the blog posts about the performance differences between 5.5 and 7, they're huge. So there isn't... I think there's not a reason to go... not for version 7. Do you agree? That's an important one, right? So that's the challenge. I mean, from a computer, it's like with PHP 7, it's a lot faster and nicer and everything, but I think you can only reuse the moment you use Ubuntu, because it actually contains Ubuntu 7, but less RL or less RL, the most important one. Ubuntu. I forgot my Ubuntu t-shirt, but... OK. So that should also... OK. Was I? PHP, yeah. The own cloud version, well, we're going for NextCloud 11, I think. There's some improvements on S3 storage coming, I think. There's one question in the slides as well, going into a little bit more detail. And we're designing right now, so we don't expect to be operational until half 2017 or so. So we have time to wait for NextCloud 11. And also, because we're not sure which kind of storage we want, we think we want Swift storage. This could be a big improvement. We would go for MariaDB, as we're already running it, and with the cholera setup, like the most customers do. We would also start with 4 VMs, with 4 cores and 25 gigs on memory and just scale up, depending on where the load is. Or scale out. Scale up or scale out, depending on where the load is. And of course, SSD storage on the SQL nodes. Network, yeah, we've got hardware low bands already in place, so for us, it seems more logical to use them. But if that doesn't succeed or doesn't have problems, we would just go to Max Scale because everyone uses that and they're really satisfied with it. So I don't think that's an issue. It's more personal preferences to use the hardware low bands. It's not on our sites. Number of sites, yeah. That's interesting. I think at some point, we'll go to multiple sites, but we'll start at one site. So that's another big question on what kind of storage should we use. And also, if we want multi-site, how should we deploy that because there are not... I don't think there are any recommendations for that online that I could find. Yes? I think we have five gigabytes quote up a user, but not really... We haven't thought of that yet, actually. Because in modern performance, you have some kind of 4K video from whatever, but most of the time, one, two, or something like that. We had a problem with these files where the channel upload was 4K and once, I actually started to do these files together. What kind of storage do we use? Where do you think the problem lies with that? In the other slides, there are a few customers who say how large their files are on the system, and most of them say... There are two or three of them who make this claim and they say 95%, 90% is smaller than 10 megabytes. So... Yeah. But there are a few customers who say they have a few users that have a lot of data and a lot of files, and a lot of large files as well. Okay. Storage. That's... That's... Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. That's... Yeah. That's a hard one for me, at least, for us who don't know which way we want to go. I think this is where we need to most help because our storage solution is just for NextCloud. I know that NextCloud is not a storage provider, but I'm hoping they can help me a little bit with my choices. I think one or two sites is an important question to make. And, yeah, well, if you see the space we're going to use, that's all within the limits of the amounts that the other users, the clients use as well. So I'm not that worried. It's just we're not sure which storage system to use. As I said, we're going to run Ubuntu. Our main identity source is Active Directory. So I think that should be fully supported. I saw it in the previous presentation. But I have some questions about the setup for that. And we like Docker a lot. So there are two customers who use it successfully. At least one. The other one, I'm not really sure. But that's enough for me to just go for it because we see a really big advantage on the deployment, at least also for the rollback. Yeah, this is where I'll give this thing to you and then you can stand here and answer my question. And now we're going to sit back this.