 Again, ladies and gentlemen, it is time for our next session, case studies 2. If you are not attending case studies 2, we do ask you to leave the room so we can have our next folks enjoy their meeting. Welcome to our session. My name is Christoph. I am co-founder of Scale-Up Technologies. This is Frank. He's our CEO at Scale-Up Technologies, running our OpenStack Cloud. To give you a brief introduction to who we are and what we are going to talk about, we are a small self-sprider based in Germany. Started in 1998, so a long, long time ago. Actually doing regular web hosting. So in 1998, at least in Germany, you still had to explain to your customer what the domain name is, that they should get one for themselves. And that's basically how we started. Our business evolved over the years. We are headquartered in Hamburg in the north of Germany. And today we have four data center facilities in Germany, two in Hamburg, two in Berlin. And we mainly do dedicated servers, co-location, virtual servers, and nowadays also cloud. So to go back a few years, back in 2009, we had more and more small companies and startup companies approach us interested in our services. But we always had the challenge that those small companies didn't have the budget of an enterprise. They all had grant plans. They wanted to have millions of users on their portal, online shop, whatever it was. But they didn't have the money to start out. So we were actually, back in 2009, researching for solutions how to, well, get more flexible offering our services to the customers. So in 2009, the term cloud wasn't really there yet. Most people called it utility computing. And we did actually stumble upon a small startup based in Southern California. It was called 3Terra. And their solution was called AppLogic. It was maybe, at least that's what I think, it was maybe the first true cloud platform out there in the market back then. It was based on Zen as a virtualization layer and actually had all kinds of open source tools under the hood. However, it was a commercial solution. So we did some evaluation of that software, did try it out, actually decided to go forward with this solution back then. What was really cool about that, it had a video-like graphical user interface where you could graphically, as you do in Visio, in Microsoft Visio, arrange and serve an infrastructure on the screen, connect all the, I don't know, the web with a database layer, put up a firewall, load balancer in front, save it, and it would actually spin up on the Zen virtualization. So it was pretty cool stuff back then. And we had lots of customers using this. However, as things always go, that company got acquired by CA Technologies. I think it was in 2010 or 11. And ever since, it didn't really gain the traction that we thought of it gaining and was eventually discontinued in 2015, last year, summer of last year. Another thing which we did back in the days as we were using AppLogic as our cloud platform, it did not really have a multi-tenant interface for our customers. So we figured there was nothing out there, we built it ourselves. So we actually started another daughter company back in Germany doing software development. They developed a web-based portal for AppLogic. At that company, we also built in basic support for OpenStack. This was, we started doing that in 2010 and actually had the first small proof of concept very beginning of 2011. We did try to market the software, it did not work out because there was just not enough traction for the AppLogic platform. And OpenStack as a support of the software really was only a proof of concept at that point. So to now come back to OpenStack and what we are doing with OpenStack. Back in 2011, when we were working with AppLogic, we also were looking for ways to offer a cloud storage solution to our customers. So as we had lots of startup companies as customers, they were all using Amazon's S3 and we were looking for ways to offer something similar. Initially, we selected a commercial solution for that. We actually also built the storage nodes ourselves. I've included a link here. If anyone knows the company Backblaze, they're also a US-based company. So they actually open-sourced the design of their storage service. Pretty cool stuff back then. So we actually got those, we ordered those chassis from a company in Canada who manufactured them and actually built out the service ourselves. However, we were running into major issues, getting the whole infrastructure into production. It turned out it was only the firmware of the hard drives we were using. I will not name the vendor. You can ask me afterwards if you wanna know but I will not name it now. It took like over half a year for them to figure out that there was something wrong with the hard disks firmware. So we actually stopped the whole project, canceled all the contracts. And in 2012, when the firmware was fixed for the drives, we actually restarted the whole project based on OpenStack Swift. So leading to OpenStack, reasons why we did use OpenStack. So as I said, we did use two commercial solutions before and we decided to this time do it differently. I mean, we run like 99% of all our servers are Pure Linux and Unix. So Open Source is kind of in our heritage and we do all use all kinds of Open Source. So that's why we decided to go with OpenStack. Another reason, you can read the code yourself. You can debug things yourself easily. There's a great community out there. I mean, a few years ago I was like on the second or third summit back in Santa Clara. It was much smaller back then but even then there was a large community which is true for Open Source in general. So this is another reason why we did go this path and there was a proven tracker code at least for OpenStack Swift. So like four years ago, there were still lots of issues with Nova but Swift worked pretty well from the beginning. So I will hand over to Frank. As I said, he's running our OpenStack infrastructure and he will go into more details, how we did set it up and what we're doing there. So Frank. Okay, first of all, I'd like to point out one major thing. This is aimed not to do any bus or any flamware on existing soft or big companies who are contributing a big deal to the community but this is to show that there's a way for medium sized companies to have their OpenStack environment running to do it their own way. And I try to figure out how it's possible to find your very own way very much to a high level customized environment. And yeah, it's working and you don't need to have the big numbers of servers. You don't need to have the big money to invest in advance to play the game. Like we have quite some years of experience with cloud computing, with AppLogic before. We still have customers running Private Cloud with AppLogic in our data centers and they're having a hard time to migrate over to the new stuff. But so yeah, what I wanted to tell you is I don't want to say nothing bad or evil about all this licensed software or packages you can have. When we started with Swift, we started with DevStack and yeah, that was the first time we ran into problems. Yeah, the direction. Okay. So our intention was to have a scalable cloud storage solution. Yeah, we were there. What's wrong? And the direction. Huh? Is that wrong? Okay, okay. Yeah, okay. First setup was with DevStack, as I said, and we ran into the first problems. In those days, Keystone was not supporting so many auth modules. And but to open the API for a lot of different clients like S3 client, for example, or CyberDoc, stuff like this, we needed a new version of Keystone. So pretty early, we started to just install Ubuntu servers and have the cloud repository opened on it and we did a lot with PIP install and stuff like this. Because they were the first attempt to use an out-of-the-box package software and yeah, it didn't work out for us. The first attempt was the guy working for us in those days, he was a big fan of a, what's the name of it? Proxmox. Foxy? Proxmox. Proxmox. Maybe you know this is software, this is virtualization software as well. It was KVM and DRBD and we had two big Proxmox servers and well, had a lot of pros, was reliable, was proven soft, we had a web UI for it but yeah, we ran into problems with this solution as well. The major problem with virtualization on this level, on the basement of a cloud is that you will not, you will not have, it's not possible to perform that high bandwidth through the virtualized interfaces and if you use the onboard interfaces on a dedicated server and you have many, V-notes running, yeah, connect via those NICs, you will get into problem especially when it comes to high workloads like you have on the local answer side where all the packages go through. There is a study by HAProxy, what virtualization costs in regarding the NICs and the best virtualization there was, I think this study is just two or three years ago, was up to 70%, they have new drivers now, it will be higher levels now but it's still, you lose a lot by virtualizing things like that. Okay, it was my, so what we started first was just take the local answer in front of the Swift cluster and put it on dedicated servers. We use HAProxy, we use HAProxy a lot as a reverse proxy in front of the Swift proxies for several reasons, it's very performant, soft, it has got good community as well and you can easily configure HA setups with HAProxy, you can do this by causing pacemaker, you can do this with UCAP whatever. The other thing was since I think version 1.5 HAProxy supports SSL termination, so this was what we needed as well and you can have session caching over the two devices with memcached and so any failover seamlessly, you don't have any packet losses or anything, this was the ideal for our solution. Well, nonetheless the Swift proxy still stayed on the Proxmox cluster and we had a keystone on the Proxmox cluster as well, the Proxmox cluster in those days was holding Swift, MySQL, Mongo Database, keystone and Cylometer and this was pretty much and on the same time we decided that we will not only have the cloud storage but enlarge our environment and build our own open stack cluster and so we left this old keystone and this old environment until we were able to move over and to integrate the whole thing into the big open stack cluster we've built it afterwards. So first attempt we had was with, well I have to explain one thing, over there in Germany and Europe you have a big community using DVN and Ubuntu. I know over here in the States it's more and more Red Hat and Santos. So we decided to use Ubuntu, we decided to have a try with Juju and Mars. I don't know if you're all familiar with Juju and Mars. Mars is, you can do bare metal provisioning with Mars and Juju is a tool set where you have a lot of charms where you can have out of the box open stack install and so on. It turned out that for our needs this wasn't the ideal solution because I was able to provision things but very often I had to work on it, reconfigure it, we had ILO interfaces, iDRAC interfaces and different BMCs, different problems with Mars in the beginning and afterwards as you will see later on for our setup which is yeah, at some point it's different from others. I had to change the Juju charms myself always and it took more time for a small number of servers to work on the Juju charms, to make it provisioning the instances automatically than just doing it by hand all on your own. So there was no use for us. For other companies this is, for sure, might be the ideal solution. If they have a homogeneous hardware of one type or maybe two types and they have big numbers they will want stuff like this or from Red Hat or another company but for a medium sized company as we are this was no solution at all. Okay, so what we did was basic Ubuntu installs and we had this basic install as an image on a PXE server and so provisioning was easy like this as well, just took one or two manual steps in between. So yeah, then we went to integrate the two environments, this is cloud storage and the open stack cluster. Well in the first step we still were using the old Keystone because still we had the new open stack and the start in Juneau and the cloud storage, the Swift was still a grizzly I think and we could not just integrate it that easy. So in the first step I just had a client, just had a project and then a user and inside the Swift environment who was integrated into the Glance and did all the image providing and stuff for Glance from our cloud storage. From that very moment we didn't need and no local storage mirror on the controllers or stuff we had all these images and so on from the cloud storage. Then what surprised me a lot was really easy to uplift Swift from grizzly to Juneau seamlessly without any interruption in business. And so afterwards we had this Swift integration, yeah like just one part of the big open stack cluster. What you see here is our open stack cluster as we developed it for our needs. There's one design, it's not a failure but it's not so very beautiful. We have all this object storage still coming through the management network. If you expecting larger workloads, you should not do this, you should have your own network just for the image service so that if many users do backups or many requests come for provisioning new images and stuff like this, you might get into trouble with your management network. Okay. As you see, we have mainly three networks which is management network, which is the internal network. We use VLAN and we're gonna plan to implement VXLAN. Within the next month. And we have a iSCARTHY network dedicated. The reason why I developed things like this was we in our company, we do use LACP link aggregation a lot for HA capabilities. We have very often we have a switch stack or four or more switches connected like a circle. And we have link aggregation over the stack. So if you have bonds on the service with link aggregation mode activated, it doesn't matter if one port or one switch port is failing or one switch is failing, you will still have one gig throughput. And as long as everything's all right, we will have two gig throughput. And yeah. And this is low cost solution. I mean, yeah, everybody is telling you we have to buy 10 gig switches or even 100 gig switches and stuff. But this is no way to go for mid-sized companies. And on the other hand, we were in possession of quite many servers or we are in possession of many servers that ran like three years or two years and then the customer want new hardware. And we like to use this hardware, which is still good, which still comes up to our needs. Yeah, in a good way. Okay. The external network LACP bond, we have two neutron servers. We're not changed to DVR. We're not planning to change to DVR. We're thinking about SDN, so 35 networking is something which interests us a lot. And we're really planning to go there. And yeah, what we have is, I come later to the Newton service. Let's come to the iSCSI. As we experience quite a lot with other solutions, even with AppLogic and external storages, it does iSCSI a lot good to have its own network. And iSCSI really likes jumbo frames. So if you have big workload on the package, it's ideal for iSCSI. So you should not mix up this or you should not mix up the iSCSI transport with a management network or anything else. This doesn't... You see, one reason we built the environment like this is that at almost every point, you can scale horizontally. You don't need to scale vertically. So we prefer to just add more compute nodes or even add some more nicks inside of the servers and have a link aggregation of three or four or something than to buy new hardware or buy bigger servers or stuff like this. All right. Oh, I don't turn to the API. I guess I'm not so lucky. Yep. Okay. What I described here is block A and block B. Well, it's actually... In the beginning, I had... Like the controllers, like you can read in almost every tutorial, every installation advisory you find around the internet that you can just have two servers, put all controllers into servers and that you can use containers on the servers and stuff like this. What you should not do... What I think you should not do is use virtualization because you lose a lot of, as I saw it, bandwidths on your interfaces, but you can use containers. I don't really see the game. If you're lacking file handles, add file handles. As long as you find a sensible way to... Yeah, order things in groups and together, then it's perfect like this. On the block B, it's like a usual controller HA setup. You have a nova there, a keystone there, a cylinder there, a glance there, a heat there, and they all are active, active. Most of them use MAMcached, and if one fails, okay, the other will do the job. Block A, over there, we've got a Galera cluster, like three nodes, poor room, and master, master replication. Too early. Okay, we've got HAProxy, which we really use a lot. You will see later on. You've got Horizon instances on all three. You've got RabbitMQ instance on all three. You've got C-limiter, most of the stuff, on all three, the external agent is only allowed to run on one, so we have some cores in pacemaker configuration added to it. And when you have the nutrient service running on this three as well. All of these three are managed by a core sync pacemaker, and we have locations for all the different types of services, so they have a priority on one of the three, and each on another node. What we do not use, what we started to use, but we do not load balance the MySQL requests. Has one reason. We ran into problems with Keystone. Keystone kind of disliked the load balancing of MySQL requests. I didn't find out why, I think it's gonna be better with the next version, but for this, I just have one virtual IP, and we only request one of the Galeric servers at a time. There's no issue with performance right now, but yeah, we will have to work on it, and we will then have, yeah, MySQL balancing as well. Another thing which is different from most environments, we have three sets of API proxies. This is the HHA proxies, the Swift you already saw. Then we have, if you remember the block A, you have the internal proxies, HHA proxies, where all the internal requests come from. So it's really fast, and it's reliable, there are no security issues, and yeah, okay. And then, yeah, as my founder asked me, I had to provide the public endpoints as well. Of course, customers want their public endpoints. First thing was, yeah, what do you do if you have to provide something you haven't got yet, and when you look at your competitors, how do they do it, and then try to make it better? And I think one of our competitors, he was like, you have to provision an instance inside of your project, and then from this instance, you can have external API access, and I saw another stuff, I don't really seem to want this. I don't want to be able to reach the external API or any endpoints only from the inside, because there has to be some ugly work-around routing to get there. So what we use a lot at scale-up technologies is OpenBSD firewall clusters. Well, this is perfect firewall clusters for usual firewalling services, and you can have this, you can home many, you can home several customers on one cluster, or you can use it for your own environment. The only thing you will not get, you will not possibly get is any certification. So if your customer asks for security audits and certifications and stuff, yeah, you will end up with Cisco again. We have several Cisco clusters as well, but for our own use, this is a perfect solution. I don't know if you're firm with, it's called PF. You can have very sophisticated setups. You can monitor the frequency of requests and everything, and just have your ACL ruleset setup like this to your needs. And I thought, yeah, it might be a good idea to have public endpoint on a firewall cluster where I have all the rulesets at hand and everything to observe the traffic coming in, observe the quality of traffic coming in or the kind of traffic, or, and yeah, here we go. There's no problem to install HAProxy on OpenBSD. It's almost the same thing as with Linux, and yeah. Right, so we ended up with, we have this fifth API proxy where you can have numerous clients just coming with their kind of requests, like one likes the URL that way, another this way, and you just do some rewrites, and yeah, works perfect. We've got this firewall solution, and we've got this pretty normal inside solution which is a good AHA setup, one fails, there's no problem at all, and yeah, works perfectly. So, okay, I think we're running out a bit of time. So we talked about the networking already, you already mentioned what we did there. Maybe, I can say two or three sentences to the Newton server. It's just a really, yeah, basic, it's a normal setup. If you're not doing DVR, if you're not doing software defined networking, if you're still with your old Newton stuff, you can have as three on N number of Newton servers. You can have DHCP running from several servers. You cannot possibly have meta data service from more than one server, and you cannot possibly have low balancer service from one or more servers. You can just build an active passive setup with CrosingPaceMaker. Easily, you have to do some scripting for the load balancer as a service solution because what it basically needs is that, you need a query which just says, okay, all your low balancer instances now run from this router ID, not anymore from that router ID, which is really simple. And then you have a reliable networking solution as well. That's it. Okay, so what's left? We have this one slide left out. So what we'll do next is we'll upgrade to Liberty probably next month, and I guess soon thereafter, or at least not with such a large timeframe to meet Haka. We will need to do IPv6 at some point. I mean, at least in Europe, we are out of IPv4 addresses, so we need to do something there. We did have a few customers requesting VDI setups in the past, so we did some research the last couple of weeks on how to leverage OpenStack, running virtual desktop infrastructures, and we figured out that we may need different compute nodes specifically to run windows there. And one idea for myself as founder of the company or the big picture out there, maybe one day we will have big OpenStack set up, spread across all our data centers as the base foundation for any services that we provide. But that's certainly something in the future. So if there's one or two questions, I guess we still have one or two minutes to answer those. Thank you for your attention. Well, there's the question. I guess you should use a microphone so the others can hear what you're asking. My question is about Pacemaker. You seem to use it a lot on the control plane. Have you used it in any VMs running on the cloud and how well does that work with the multicast or do you not use multicast? No, you can use unicast with causing a pacemaker and I do this a lot. I don't use multicast, multicast. And the other thing is you have to, before you have to configure a port with Neutron, that will hold the virtual IP. It will not work out of the box. You have to tell Neutron that you will, you're going to use a virtual IP and this will be part of those two VNOS, for example, but then it works perfectly. Thanks. Okay, so there's one question on you. It's hard to see from up here. Are you running an open stack at multiple locations and if you are, how are you scaling it? At this point, we only run it in one data center location. It's still a small installation. Most of our customers still use other things and we're trying to convince them every day to do something better with open stack. But this is certainly something we will do at some point. Yeah, I have to explain. Maybe we started with open stack like end of 2014, beginning of 2015 and we had about three or four months to production. Now, we have more and more customers and we have a lot of our own stuff now running on open stack but we still got hosting service on AppLogic and still our own stuff is not migrated totally and we are busy migrating everything over into our cluster. Do you expect downtime when you upgrade to a new version or if you do, how do you deal with it? It depends. It depends on what downtime. I mean, RECSpace now has this sentence, 99.9% or 99% API uptime. And that's a big number because if you're doing upgrades you will run into API downtime possibly. But this doesn't necessarily disrupt with the or interfere with the customer services. They will not be able to use it, not be able to manage their stuff for a while but the availability is up. I think what I meant was that what the VM will be running there, maybe they cannot put in new things on it but in general the VM will be still running up and that's what I meant the downtime. We're not talking about infrastructure downtime or something like that. So do you expect that there will be very little VM downtime during an upgrade? Not at all because what I did two times already is I've upgraded the test environment which is almost the same. It's only one compute node. It's not several compute nodes, it's one compute node but the rest is the same and I did it twice and I had some downtime with the APIs but I had zero downtime with the venos. What was the Galera cluster version that you were using? I'm not sure, can't tell you by heart. The reason I asked is I know you mentioned Keystone had difficulty load balancing with the Galera cluster so I was just curious what version you were running. Okay, yes actually it's Pekona and I think it's Pekona X cluster. Shoot us an email, we'll look it up. Yeah, it's no problem. It must be 55456 already. Was that on Ubuntu Ubuntu? Ubuntu trustee. Okay, well, another question. Sort of a question about the actual theme of this talk which is a small company bringing up OpenStack so kind of two quick questions. One is how big is the team that manages OpenStack? How many people? Well so the total company, we're 10 people and OpenStack is mainly Frank, I do some stuff and then our other engineers help out where they can but it's mainly on one shoulder really. At least like all the research and stuff. I mean if you have a small company you can't really set apart one or two people to research all day. When it comes to development or yeah, then it's me mainly. If it comes to maintenance or just normal work we have our technicians, we have five technicians and they can do it as well. They're a torture place. And then the other question is just roughly how much did it cost for the hardware, the infrastructure? Well we had no cost for the hardware, almost no cost for the hardware. We bought four switches, we have a steady stream of servers coming in for customers going out of service so there's like, well actually there's like too much hardware, we can't really use it all. So the only thing we really need to buy is like networking gear. I guess that would be the question. The iSCSI was all new network probably. Yeah. Okay, well thanks. Okay, thank you. Thank you. Thank you.