 Hello, thanks for joining us for this very first meet-up for the Bonnet Tech Talk, do it yourself, talk to the different data centers. So this is the first one. We plan to have you all develop them, and I'd like to start with very high level to talk about what might be in here in the next meet-ups, that's what I'm talking about. So very high level to talk, just a very broad thing. Martin, would you join us later for the in-depth? Nothing, nothing. So you will see some code, but not from me. So let's start. My name is Martin Espar. I'm working in the co-bite engineering team here at co-bite, talking to the software's team in the data center. Let's see how far this goes. This is a concept of Google data center, which they planned out to roll out in 1999. So this was their first idea of how to build very commodity hardware, big cluster, and they plan to have multiple of them. But this is just a zoom of this one. An every interface card, the hard disk drive, CPU, and so on, you know how these things look like, but really just a bare minimum of what's required and as efficient as possible. This is how they look today. Thousands of thousands of machines. The question is now that we have thousands of these machines, what do we do with them? And this is actually what the talk is about. Software Defined Data Center, let me try to define this word. Maybe you have heard infrastructure as a service or cloud computing. In the end, it's trying to build virtualized infrastructure, which is completely decoupled from all the hardware that you just saw, and make it somehow manageable. So what helps to achieve this is how they did this, especially if you have a look at Google, is they started with software defining everything, making it in software. So we have compute, networking, storage, and lots of, lots of blue coded software to manage, run, and drive all these systems. From a user's perspective, if you might know this, today, Visa card and HTTP requests, that's your cloud, everything's very shiny, very fancy. It's super easy to access very complex infrastructures. You get for cheap money, big instances of hardware, whatever you want to have. You have reproducible infrastructure, so basically you can code what you want to have, just put in some HTTP requests, and then you get your virtual infrastructure, your completely virtualized data center, for the seconds in the end. You're super agile with this, everything is up and running in seconds, not trying to get, I'm sorry, English, and then untuck to buy some new hardware and put it somewhere in the data center. Super elastic, so you can start very small, even on a single machine, and then just grow wherever you need. You just pay for what you use. This is not bad. So from a operator perspective, it's a super centralized infrastructure management. So all these machines, you can adjust with some people, you can build such a system and run it operator. If Google or Microsoft or Amazon buys a cluster, it will just hundreds of servers or hundreds of racks call it a cluster, push a button, then things are rolled out and it's super automated. So there's close to no human intervention necessary to pull up complete clusters, except for putting them into the data center. In the end, this allows very scale operations. And it's very efficient because with this virtualization, they let many, many clients on the shared hardware and they get a higher resource utilization. So there are no idling machines, there are no GPUs which are idling on the server. And this enables the economies of scale. So if you are able to just build one of these clusters, you know how to operate it, it's very easy to just put in another hundreds of them. In the end, by making everything virtual and just software defined, you are completely decoupling your infrastructure from hardware, from buying the hardware. So you can actually plan for the next month and do not need to plan for the next years to just all this graphic card I would need for that project, something like that. So this is not ready anymore. Cloud computing in general, is it just someone else's computer, would you like to go for private, for public, for hybrid cloud? So there are many, many talks about this and many issues. Can we go, as a company, go private or public or something in between? And you need to consider here things like compliance or security. There are many risks involved. Would you like to control the complete stack who owns the screens? So if I have my complete data center in Amazon Cloud, for example, and they just want to shut my company down for some reason, basically I'm lost and need to move things. And in the end, it's the price tag involved. So we have some customers who store many, many petabytes of data, and if you try to calculate the costs for just having them online somewhere in the cloud data center, this is basically just too expensive just for storing them, but also the data transfer. So every bit that goes in and out of the data centers costs a lot of money. So at some point in time, there's a tipping point where it's actually just more powerful, more performable, so very much cheaper to host a profile for yourself. So for example, Dropbox just recently moved to a private data center because it's way more cheaper than having it all in Amazon. What are the neighbors of these software-defined infrastructures? In the end, we just saw these Google examples or the commoditization of hardware. Back then, many, many years ago, we had very special storage systems, very special compute systems, and so on and so forth. Today, to basically just take the building blocks that you have, combine them, whatever you need, buy hundreds of them, stack them up, install some software and you're good to go. One of the important drivers here is we have very fast networking, and this enables a lot, but I will talk about that later a bit more, and then advances in infrastructure research and software. So also, again, a guy named Leslie Landberg, for example, which also is a part of Google in the end, has put a lot of new research into the how to actually drive distributed computing, make things super reliable and fast and how to actually use these big data centers in a meaningful way. And then, of course, there are lots of economic drivers, so software is eating the world. We have lots of new businesses needing lots of infrastructure and there is actually a huge requirement for these virtual data centers. A little bit of where we come from, what we require to get there. When you started talking with computers like 10 years ago, you actually knew their name. So it was more like a pet that had a name. You build and manage it by hand, log in, install some software, so basically hand fab. They had to be always available, and actually in the end, you would put in more disk drives, more RAM, more CPU to just have your little pet very nice and shiny. Today, a computer basically just has a name to find it in your data center. This is encoded into the name, so nothing fancy anymore. They are automated, created, fully automated, but they are designed for failure, so basically it's just robot-complete rack and everything will just work on smoothly. They are API driven, so everything has an API today, and they are built for scale-out, so just add more and more and more and you will just scale out linearly with your performance. Let's talk a little bit about software-defined compute. So how do we compute the bits? So today we have bare metal provisioning, so basically you just start on a machine and we'll get how the operating system should look like in the software that runs somewhere from the network. We have hypervisors to run virtual computers in the end. We started with sort of emulating complete computers, now we have a lot of hardware support so that they can run at near bare metal speed. The full operating systems on these hypervisors, but there's also something called unique kernels, which is quite interesting, so they rip out everything from the operating system and just have basically a single process running directly on the hardware. And then, of course, containerization, Docker and so on and so forth, take the Linux, for example, put in some namespaces and C-groups and you have these containers. And then there are now something called user space containers because you don't want to run your Docker as root. So, again, Google put in this G-advisor, which is basically user space thingy. Anybody's interested, I'd love to talk about this project data over here. Hardware support, how does it look like? So I started this emulating things completely in the computer. Now CPUs have these VXT and AMD via something many very fancy names, but in the end it's always slicing physical cores and providing parts of hardware to the virtual machines. The CPUs also have managed private, managed, sorry, they managed privilege and guest modes for your applications. Then you can share PCI devices. So for example, take a network interface card, put it in your physical machine and then provide access to multiple virtual machines but with direct connection to the machine. So actually you share parts of the PCI interfaces to your virtual machines, which is quite interesting because you have direct access to the hardware. This holds true for GPUs or TPUs or whatever accelerators that you have in the end. It's always either you get some shares of the systems completely isolated to the processes or time slices. How do we transfer the bits? So far we find networking. So in the end it's still, we have network interface cards, cabling them together and then it's can transfer. But in the end what we have now is software you find networking is very high performance data plane and a very flexible control plane to actually configure everything in your software again. And what we have are overlay networks which are put on top of these physical networks. So you can have your private virtual network on a very fast interconnect. And in software you can build switches, routers, firewalls, load balancers, whatever we had years ago in hardware dedicated, very expensive machines are now just some lines of code. Also physical networks are getting fast. So 40 gig is more or less common road today. 100 gigs are rolled out everywhere and they are even faster. In the end we have something like 10 microseconds round trip times between machines and this enables some interesting things. For example, just for software design storage it's interesting for storage at all. Before you had your physical devices in the machines from where you access them but with just say 10 microseconds of latency between the devices and you have an average access to the agency for SSDs or 100 microseconds. Just these 10 microseconds extra just we don't really care about them. So we completely decouple the predecessor of the data from the actual physical location. And this enables, is a big enabler for software defined storage. So our definition of software defined storage is something like put your, get your commodity hardware, put in some disk drives whatever you have, run them with maybe hyperconverged with your compute virtual machines whatever you have alongside with the disks. This is something that you can build out of it. But the interesting part here with the virtualization of storage is you completely decouple your logical bits from the physical locations of the files. What this also enables is that you can fully have full flexibility of how to configure these systems, how to run the systems. And for every file you can decide in software where and how to store and assemble the actual data. And again, you can run these movie tendency approaches so shared, isolated, users, clients, virtual machines, whatever you have and they can run on the very same set of disk drives as SSDs and VMs. Here again, many years ago we had these big boxes, we had high availability, master, master, something. So there we got the reliability and performance of those systems. And today it's just software, it's algorithms which solve all these problems. The blueprint, and this is as technical as I get in this talk, how to build such a software defined data center. These are just some buzzwords and just one idea of how we could do this. So basically you start buying some hardware, provision your cluster with a tool like Formen or Matchbox, which we will see later how this works. Set up your networking, set up an orchestrator for VMs or containers. So later we'll see how this works with Kubernetes. Then you need to have some sort of identity management, put in some software storage and some more tools for monitoring, alerting, billing, whatever services you'd like to provide. And then you're good to go as an operator then the client can use, for example, Terraform or Ansible to spin up its virtual infrastructure and then you're good to go, deploy applications and profit. Very simple. Works in 30 minutes. I hope so. Software defined X, so what's next? So we have compute, we have storage, we have networking. This is more something that I read about in the last days or so things and I'm interested if we might see something like that in the future. So since we know software defined power, so when is power available? When do we have too much? When do we have too little? And since, especially in a data center, we exactly know which machine consumes how much power, where things need to be cooled, how hot zones are. We can actually try to schedule algorithms or parts where actually we have power when things are available and shift things in the data center. One of these examples is the Chinese Bitcoin miners who are actually starting their ASICs when there is too much water in the water, power plants, basically very simple, a part of this, an example. Okay, so conclusion, the data chain data center is changing and fast networks are actually the key to achieve all these nice things. Today, everything has an API and some tools to actually use it and make meaningful things with it. Everything is reached through decoupling and virtualization, so the actual logical representation of anything, the file, the computer, the network, whatever, has nothing to do with the actual physical representation of it, and this is really the difference from the classical data center to be fully virtually softwarely fact thing. Inclusion, I would say software already ate the data center, so there is not much more to actually eat, maybe we have an opinion on that and I'd like to hear that, but there is actually, I don't know what else would be virtualized here, so since we have all these credit card, HTTP requests, though, that's it. I have only four slides of every time, and so you can know all the details, three beer, three snacks, a little bit of where you are here and what we are actually. So, I'll try to be fast. We are part of this softwarely fact story and we use commodity servers to actually turn them into the softwarely fact storage. It's a data center file system and basically for every workload that you have in your data center, we have a good thing that fits into there. You have block storage, project storage, and actually file access. We have this linear scalability, so add a rack, install it whole wide, scale linearly, and we're designed for lights out operations, so fully automated, everything that you would expect to run on a lot of these machines. A little bit of this blueprint, take the machines, put in whatever you have, install the linux, download and install a format, and you're good to go. And we will actually see how this is working with some sort of hyperconversion thing that Martin Schoijer has. Everything that we find in today's data center, we have sort of an integration. Take OpenStack as one of these orchestrators, so we are perfect fit for that. With all the container infrastructures, we can actually use them, but we have a good solution for that. If you run big data, with, for example, an Hadoop connector, which is even faster than the original HDFS system that they build, they are metal, so if you have a virtual windows, demodes, whatever you have, you can access a parallel file system. And then actually right now we have, you can run big format software inside of other data centers and use their virtual machines. Their local MVDs, whatever they have. So basically bring your own shared file system where some customers are actually doing this, with new features, with async replication, we are also able to do this hybrid cloud thing, post data here, mirror some things into the data center, let them run there, get things back. One thing, this is my last slide, and this is some part of our project, which is really awesome, I think, is that independent of which client you're using, you actually have access to the same file, and this also means if you have your macOS put in a file, you can use that particular file to run HDFS drops on it, and directly have a look at, we are S3, so an object is in file, it's all the same with us, and you can lock and do your ACLs, so the complete story around this one file is shared with all the interfaces that you will find in using the data center. Thank you for surviving these first slides. Thank you for listening, you're not falling asleep. I hope you have taken at least a little bit out of the store, very highly on today, Martin will show us some codes, if you're here for some code, you can see it in person. If you have any questions, maybe now or later, we will be here, and we are over there. Thank you.