 So, hi, I'm Jacob Rosenberg from Bloomberg and we've been talking at events like this the last couple years about building our cloud and the infrastructure that goes into it and the things that we hope to do with it and all of the interesting technology challenges of working in the space we're in, but not so much about actually getting people to use it. So, today I'm going to talk a little bit about an idea that, you know, people don't just adopt this technology. We have a large developer community and you have to incent them, you have to convince them that they want to make the change to a new world and a new technology stack. And so, the title of this talk is, your cloud is amazing. So why aren't you running everything on it? Before we jump too deep into this, quick disclaimer, I call this the lies, damn lies and conference presentations section of this. So, you see a lot of really interesting presentations at events like this. People talking about kind of speakers, talking about their incredible dream state, how wonderful their world is, all the cool things that they've built and deployed and are theoretically using all throughout their infrastructure. And get really excited. You start to believe the dream. You start to think, oh, wow, their world must be great. They've got unicorns and rainbows. They've got amazing infrastructure that solves all their problems. They sleep weeks at a time without a non-call page and life must be grand to be them. But then you have to go home and when you go home, okay. So you confront your reality and reality may not be as fun as the imagined one that you had. Maybe in that reality, things are broken. Maybe in that reality, you're in the midst of migrations from an old technology to a new one. Or you've got application teams that don't want to build on your infrastructure, want to do something different. You've got piles of tech that as high as the mountain everywhere you look and you start to despair. You start to look around and wonder, why can't my infrastructure, why can't my world be as great as that cool thing that I just saw last week? And you end up in kind of despair. So this talk is not about the fantasy. Don't despair. Don't give up. The world is actually a lot more nuanced than everything is great and our infrastructure is perfect or everything in your world is terrible or great or whatever. It's a much more complicated thing than that. This is a talk about facing your demons and then building towards the thing that you actually dream of doing. We all have technology demons that we deal with. Sometimes these are in the form of tech debt. Sometimes these are in the form of business challenges. Sometimes there are things that we're stuck with that we chose ourselves and are the way we want to go but we have to kind of fight through them to get where we're trying to get to. And I think the point of this is don't get frustrated. Don't feel like it's impossible to get to a better place. This is about some tools to get there. So let's talk a little bit about my dream state. All right. So here we go. Jacob's dream state. I may like wine a little bit. So every organization has to figure out their dream state. This may be my personal dream state you're looking at right now but it's not the Bloomberg technology dream state. You need to figure out what that is and you kind of need to keep it current. Keep it updated. What's true now isn't going to be true forever and a lot of the technologies that you've adopted and have been coordinated you and your company are going to be irrelevant in a couple of years. It's just a matter of change. So there's different stakeholders also who contribute to that. It isn't just infrastructure teams. It isn't just the business. It isn't just application developers. It's a whole bunch of things. So here's a sanitized version of kind of what I think mine would be. I really like open source software. Maybe it's a little obvious at an event like this, right? Because there's a little bit floating around here. It's right there in the name of the conference. But we at Bloomberg like using contributing to open source software. We write a whole bunch of our own stuff as well and we contribute to a lot of projects and we do use some things that aren't but we try to balance the using other things with that. And we try to, anytime we're solving a non-Bloomberg specific problem, take advantage of what's great in the community and be part of that. And some of the logos you see here represent things that we use and contribute to. Some things we're just exploring. Some things that I just find intriguing. But the point is it's fun to be able to work as part of a community rather than just be a customer. And it gets back to a phrase I like, which is community not contract. It's a different relationship. So the second thing would be kind of commodity open hardware. I think it's interesting to see how much this is just starting to take off. But in the server space, it's actually been around a while. You can move away from buying appliances and solutions that you can't really open the box and see what's going on towards a place where you can. And you can join like-minded people who want something different than what the retail universe is willing to sell you and make stuff now. This is obviously pioneered by Facebook with Open Compute Project, but there's a bunch of different efforts around networking and storage as well as the core compute products that started out there. And the combination of open source software and commodity hardware can be incredibly powerful. So horizontally scaled applications. We get to the application piece, which is really tricky. Having applications which are well built to work in these kind of environments that we want to live in are critical. And the horizontally scaled applications are the ones that tend to be the thing that we would like to see. These tend to be not single instance. Sometimes we're talking about distributed or clustered systems. Sometimes we're just talking about taking advantage of long-sending, existing concepts around HA. But the point is that don't build single points of failure. Design services based applications that scale horizontally and take advantage of compute in different sizes. What we produce in our cloud is limited some. And we've deliberately limited it in an attempt to try to take advantage of the hardware that's available, but also to keep people from building vertically scaled. And it's helpful to force people a little bit with the things you make available to them. So it gets to kind of an adjacent concept, which is these kind of cloud native designs. There's a whole lot of different names and a whole lot of different buzzwords, microservices, factor service, all the things. The 12 factor approach, a general push around statelessness, which usually ignores the fact that many pieces of your application can be stateless, but there's always some essential state in most interesting applications. But getting the people to build the applications at your company to think about how they can take advantage of the behaviors of a cloud environment, which are going to be a little different than perhaps the technology stack that they originally built on, as if their applications have lasted a long time. This is kind of an interesting thing, because much as this very large-looking aircraft is not inherently comfortable in the sky, it requires a little bit of help in the form of four very large engines to fly. A lot of the applications that exist in a world like ours didn't originate in this world and require some kind of assistance to get there. So there's a fair bit of re-engineering work that needs to happen for them to really be cloud native. So that's kind of the 2017 infrastructure dream. Sounds great. I'm sure everything that I have looks just like that. And there's nothing that doesn't fit into that box that's all perfect and easy and simple and we're flying cars and robots making our coffee and that little beer cart that finds us and distributes it as soon as we're ready for one. That's actually not true. So the reality of any large company, any company that's been in business for 10, 20, or in our case 35-plus years is going to have tech data accumulated. It's not a bad thing. It's simply the reality of having built stuff, successful companies that scale quickly and grow really fast are going to sometimes generate a certain amount of tech data. If you're well-behaved, you're probably good at cleaning it up as you go along, but the chapters are good that you probably have had a bunch of times when you haven't had the opportunity to do that. In addition, there's probably times when technology changes have happened as you've been working at something and something that was a great design in 1997 probably isn't as good a great as perfectly designed in 2017. And so it's become tech data rather than having been created as that. It accumulates. And when you have millions of lines of code or tens of millions of lines of code or hundreds of millions of lines of code, you're going to have to touch it every now and then, maybe not fully refactor it, but at least go in and poke at it. And that's a huge amount of work, even if you're a very large company. So actually refactoring everything to be cloud native when you've got 10 millions of lines of code, it could be a stop the presses, don't launch anything for five years kind of effort, which is unlikely to ever be something that you could do as a business. It'll take months or years or maybe more than years. I don't know. So how do you tackle it? Well, there's no magic wand. There's no secret that I can tell you. It's actually a whole lot of boring individual things. Here are some of the things that we've been trying and some of the things that we think work. And it's a matter not necessarily of the silver bullet, because there are no silver bullets, but a whole lot of lead ones. So the first one is embracing incrementalism. What is incrementalism about? Well, the idea is that you're not going to tackle everything at once. If you have a redesign that is required for your applications to move to an infrastructure and you have to rebuild every part of a large, complex, high-risk system at the same time, you're probably not going to do it. Or if you do it, you may rebuild for five years and then flip a gigantic switch and then really hope everything's going to work okay. My experience has been those really hope everything's going to work okay moments usually end in tears. And tears are the cleaned up version of that sentence. So incrementalism about breaking it into bits, taking it a smaller chunk at a time and having the overall infrastructure work at each stage of this so that if you get dragged off to a different fire in the middle, you're not in a place where everything is broken. So first piece of this is don't try to do it at once, but at least try to stop doing things that go the other direction. You probably have a sense of what these are. Once you make a decision to move in a certain direction, as you start to look at other projects going on, you'll see a slew of them that are probably going in some other direction. And they'll take you further away from the ideals. So if you can, try to share the vision of what it is that you're trying to do. Have other people understand why you're doing it and where you're going with it. And try to convince them that they don't want to go in that wrong direction. That they're just creating more work for themselves later, that they're potentially adding additional risks, that what they're doing may not actually benefit the larger company. You won't convince everyone. You never can. But if you can at least do your best to try not create more to clean up later, that's a plus. And over many years, it actually becomes a pretty big deal. Second piece is you're going to have a lot of legacy stuff. In the beginning, it'll all be that. 100% of what you have will be stuff that probably isn't the way you want. And you'll have to start building something in the shiny new microservice amazingness or whatever it ends up being by the time you get around to it. There's probably technologies not even invented that will make all of that technology stack look bad. And the point is when you first build it, it's going to be a little island. And you're going to have to find ways to have that little island interact with the rest of the universe. And probably you're not going to be quite sure if that new thing is actually going to ever replace all the old stuff. So one of the early projects you can do is to start to fence off some of the old things. Build interfaces, build APIs, build ways to connect into that world that mean you don't actually end up having to touch all that stuff, but you don't necessarily have to import all the things you're trying to get rid of into all the new things you're building as well. And then back to the first point about try not to make things worse while you're making things better. There's a fair amount of cruft that comes from doing this. You're going to end up building facades in front of things and translation layers that you will probably end up throwing away at some point. That's fine. You kind of have to take a breath and get over that fact because part of incrementalism is about understanding that you're building scaffolding. And scaffolding goes away before you open the building. But through that process of construction, you kind of need it, partly because it's the only way to get at the things you want, but also for safety. And building these kind of interfaces does permit you to kind of interface very different systems together. So while it does potentially seem like the kind of thing that is wasted effort, throw away work that we all try to avoid and inherently, it's helpful in being able to tie things together. Another thing is as you touch things, for other reasons unrelated to this particular project, you'll probably have opportunities to go in and deal with things. And you may not want to do a full-scale remodeling, but if you're going in to do one thing, fix a little bit every time. There's a whole lot of bits and pieces that you need to do something about. You're not going to get it all done at once. But if you're going in and cleaning up a security problem, you may want to go in and take a look at some of the API layers you have. If you're going and changing a backend database, it may be an interesting opportunity to question whether this is the right way to store your data. If you're doing a database access layer project, it may be a good time to think about whether some of these bits and pieces can be moved into cloud environments. So taking little bits and pieces that ride with other projects means that you're not going to have to justify 100% of the engineering work as part of migration to the new shiny, perfect world. It also means sometimes you'll learn about the project from a different perspective. You'll see that the work that you're doing is actually helping or hurting, in some cases, the overall effort of trying to move in this direction. So if you can gradually finish entire segments, you can check them off. That's great. But even if you can't, just being able to do little bits and clean things up, it helps a lot. So the second of the big ideas here is this engineer more operate less concept. So this is an elegant staircase design. And some administrative tasks feel an awful lot like this staircase, like you're climbing something endless. And when you get to the top, you may go nowhere. What was just painful work in your pre-cloud world could be fatal work, could be dangerous, could be impossible in your post-cloud world. Let's say that you have tasks that involve a certain number of human beings, and then you've increased your node count massively because you've gone to smaller horizontally scaled systems. And before you know it, you're in a place where you can't keep up with that workload anymore. Or it's just become ephemeral machines. And there's no way for human beings to get in and flip the switches and do the doodads. So what do you do with this? Well, you can operate things, you can administer things, but you can also build automation around that. And so if you have an opportunity to get in and do some of that, it's an interesting way to reduce the amount of pain. So automate everything as you transition to cloud. It's kind of a very pat thing to say, right? Because we've heard about the magic of a lot of these things. I like the idea of the automatic donut machine. It is my continuous integration, continuous deployment model of choice. Partly because I like donuts. They don't go as well with the wine, but they're very tasty. But also because the idea of them is very interesting. There's a batch concept to them, but there's also a continuing flow. There's the ability to think about releases and rollbacks and things like that as well. The point is this is the time to go in and start to add this kind of technology. It's going to facilitate what you're trying to do if you didn't already have it. But it also means it'll make your testing easier. It'll make your ability to do parallel systems easier. It's something that no one ever has time for, but has enough benefits for your overall engineering group. It's worth doing if it hasn't already happened. Another big piece is packaging. So if there's ever a time to get software packaging into a better place, it'll be now. If your systems aren't packaged neatly, deployments are going to be really messy. Your ability to recreate stuff is going to be hard. You're not necessarily going to have an elegant deployment model. There are a lot of different ways to go in this. Certainly a lot of software is still packaged to native OS packages. Some people do things with containers. It's about a third of all the presentations here seem to be somehow related to that. And there are other options too. The point is choose something. Choose something that works. Choose something you can automate. Choose something which actually helps your engineers deliver stuff faster. And I'll let you rebuild all of these things in an automated way and you get the benefits. You get the benefits because you can rebuild those ephemeral instances if that's the way you want to go. If you need to do testing because you're moving functionality over or even just the reduction of the complexity of doing deployments, it's a pretty beneficial thing. It's not to say this is an easy thing. This is touching a lot of infrastructure that you've probably avoided touching for a while, but it's a helpful piece. And then there's a last thought in this area which is you need to be a little ruthless about some of the bits and pieces that you have. You shouldn't actually try to move everything to the cloud. There are going to be some things that just aren't going to make it. What is a good example of something? Well, so inflexibly licensed software is my personal favorite. If you have software that has physical dongles that manage its licensing, it's probably never going to make it. And the amount of time energy you fight with that is probably not worth it. Weird proprietary networking schemes that are physically unique to that product. Another good example of something that's probably never going to happen. And there's probably a slew of things like this, something that requires unique hardware, or something that requires unusual presence. There's a bunch of cases where it's just never going to happen. So what do you do with these things? Well, so we talked mostly about refactoring up to this point, but there are places where you just kind of need to go in and think about wholesale replacement. And that's a scary, time-consuming and pleasant thing. So my basic feeling is on a lot of these things you punt. There's an 80-20 rule to a lot of these things, and yes, you're going to get tremendous benefits by moving infrastructure to something that's more programmable and I'm sure you'd like to have everything in that world. We would, certainly. But you're going to get a lot of the benefits by going after the things that are a lot easier to go after. And if you can engineer your way around touching some of these things or at least leave them for last, you'll end up fighting and potentially dying on a project that may be a very small part of the overall system that you have. Getting back to the earlier section where we talked about fencing things off, you can ultimately fence off some of these weird dependencies and strange technologies and leave them. And maybe abandon them for a future day when you'll have an opportunity to tackle them. Maybe wait for other technologies to supersede them. Maybe wait until you have the time to do some really unpleasant heavy lifting necessary to port that functionality out for your business. But just because it's there doesn't mean you have to try to do it. And the desire to check off every box and finish everything shouldn't sign you up for what could potentially be difficult or at least a poor return on your time energy. There's usually so much stuff to do. You can save some of the things for last and come back when maybe the situation's better. So third big section is about tearing down organizational walls. So kind of people have accepted this Conway's law idea that systems resemble the organizations that build them. And infrastructure is not really an exception. We build a lot of organizational lines around who owns infrastructure and who is permitted to touch something. Who is not permitted to touch something. What are the interfaces between the two? It may be in the form of a ticketing system or a budget request or something very slow and inefficient. Engineering groups, testing groups, operational groups, IT groups, different organizations of different names for them. They all tend to have a certain amount of shared common purpose but often there's a certain amount of distrust or at least a difference in organizational vision or purposes. You're getting to a place with a lot of these technologies where you're going to be giving people a lot more power and or at least you have an opportunity to do that. And the question is what do you do with it? And my advice is just tear down those walls. Now how does that work? Well first is everyone's got to learn how to use this stuff. So if the only people who learn about the new technology are the group that builds the cloud or the only people that learn about your CICD or the team that builds that system, you're not really going to get a lot of people buying into it. So teach everybody, right? Training and education are pretty essential to successful adoption of any of these things. Teach people how things changed. Teach them why it changed. Talk about the other things you considered and how you got to your conclusion and why this is something that needed to happen. There may be different organizations that have different ideas about things but they probably have the same upstream business drivers that are pushing them to go faster or manage their costs differently or test better things of that nature. Talk about what's actually important there and why people wanted to do something differently. Everyone should get the same message. It shouldn't be just communicating to each organization about their part. It should be communicating the overall picture to everyone so we're all on the same page because you need to share a lot of information. Once you build things, the other thing that you need to do is be transparent about what isn't working. You'll probably find that things break from time to time or things at least performed differently than they did before. Share your metrics, share your logs, share your operational data to the extent you can. Share configurations, share your changes. Be open about what's happening in your world. Part of it is a trust thing. You're changing a lot of the things that people really understood and knew. If you know what a server is and you know what a virtual server is and there are static things in your world and a cloud is a much fuzzier concept, it may be very hard to get people comfortable with exactly why things are happening that they don't understand. So just share. Also, if you build systems that collect and expose these things, you'll probably see other people will use them. Now, that doesn't necessarily mean you need to build all these things as services for everyone, but certainly a certain amount of transparency and directness can help build some common patterns of good things to do as we build applications on top of infrastructure. It is very comfortable and quite frankly very easy to point fingers at what changed anytime something goes wrong. If we update something, that's the first thing we're probably going to look at the next time something goes sideways. And it's probably the right thing to do. The last thing that changed is often the thing that is the reason for a problem, but the more you can actually share information, the less likely you are to have kind of uninformed finger pointing about that kind of thing. You'll have the data to show what didn't, didn't happen. And I think the more you share those across engineering teams, the more you can kind of build a certain sense of trust. I guess the last interesting piece is it starts to come into a theme that I imagine is not all that unfamiliar. So you want to enable a certain amount of self-service and trust people to do the right things. It's important that you give due consideration to the fact that the people who are working in these environments are probably also engineers. They're also fairly smart people. If you've taught them enough, they're probably able to not be a danger to themselves and others. Sometimes things will go sideways no matter what, and you need to help them to learn what happened when they do. But we're evolving into a world where people aren't necessarily going to have a set of trusted people who are the only guardians of system stability. The people who build applications and the people who run infrastructure both have a vested interest in the success of the overall environment. And you have to work together to actually get things done. These shouldn't be super unfamiliar principles. If you've been in an organization which has talked about DevOps any time in the last decade, these are probably a lot of the same concepts that you've talked about then. But this is not a theoretical statement or a new kind of role. This is actually about handing the keys over to people who you probably may not have trusted to have that kind of control and dealing with the consequences of that, which is really icky feeling if it's not in a place where there's trust in communication between organizations. The thing that's pretty difficult about this is that you do have a lot of history where people built barriers between each other. The little poster here in the middle has the bank teller and the patron opposite sides of that. And this is a model that a lot of organizations built. There were people who were empowered to go in and touch the magic things in the vault. And there were people who were somewhat distrusted as participants on the outside of this. And I think the key part of tearing down those walls is to acknowledge the fact that we may not have perfect trust. We may not have a completely open environment, but we're all in this together. And we're trying to work together to achieve whatever our business objectives and goals are. So the last stage of this is about helping all the people in your engineering organization who are taking part in this to help themselves to learn about the metrics, to learn about the automation that's been built, to understand about how all these different changes are going to affect them and to have a voice in talking about what they do and don't want. We're going to have people who are not going to be comfortable with this and may object and may opt out and it may actually be a fairly long period before everyone's on board. But without a certain amount of shared experience, shared fate, you end up in a place where this is something that is getting pushed out, something that's getting inflicted upon engineering organizations that have their own agenda, their own things they have to do, things that need to be delivered to the business. And for most of us, these all fit together, right? The work that we're trying to do is part of that same mission. And being focused on that is why we're here. So thank you for your patience. We've reached the far end of the crazy rainbow. There's still a lot of stuff that we're still working on and I think it'll be a number of more years before I would say we're anywhere near done with what we're doing with the cloud. In the last few years, we've gone from the proof-of-concept phase into something where I would actively say we're using our cloud for real production applications. The most exciting milestone of this is that when we've had occasional outages or blips, we now have people who are legitimately really mad about it versus just kind of triflingly inconvenienced, which is a nice sign someone actually cares. But I think the point is that whether or not you do have rainbows and unicorns as part of your system design, being able to manage the transition and being able to deal with the difficult parts is how you get to a place where you are able to take advantage of the infrastructure that we're talking about in building here. Thanks.