 Are we on? Are we good to go now? Yes. Okay. Thanks for coming everybody. My name is Mike Waters. I'm going to be giving you a kind of a one-year update and I'm going to be partnering here with Steve Wall. He'll be up in a few minutes, but this is kind of a one-year update. Last year we gave you a preview of what Digital Globe was planning on doing and this is kind of where we're at right now and the lessons we've learned. We're trying hard to impart our knowledge in what we've done and learn any lessons and get any feedback from you all. It'll be great. This is a shot of Mount Fuji actually as we have a constellation of satellites in Image of the Earth. So if you've ever seen Google Maps, Bing Maps, Apple Maps, any of those, we supply almost all their imagery to them so that's what our company does. This is a fun picture. It's Mount Fuji as it come over the horizon as our satellites were way off to the side so really awesome profile view of Mount Fuji. So I'll give you a little bit of quick background on what Digital Globe does. We'll fly right through this and we'll take a second. We have a constellation, like I said, of satellites at Image of the Earth. We collect about three and a half million square kilometers of imagery a day, which is about the size of India. Or if you like little things 21,000 times the size of Lichtenstein. We now link about five terabytes to six terabytes of brand new data a day and that's highly, highly compressed. We turn that into about 40 to 100 terabytes of new products per day that we ship out to customers. So we on average add tens of petabytes of stuff every year to our archive. We have tons and tons of tape and over about 60 petabytes of spinning disk at our shop. So a huge I.O. company. We also have a platform in the cloud where you can actually bring your algorithms to our imagery so we don't have to ship you these hundreds and hundreds and hundreds of terabytes of stuff. You can come and torture our imagery in AWS. So we have a few satellites up, two of them have been retired. We've got four other ones. We'll just kind of fly through this here. So what do we do? In 24 hours we collect that, seven days, one month, six months a year. We can paint pretty much the entire earth several times. So what happened? We needed a new architecture. We had a monolith, you know, insert monolith story here. Everything integrated through a database. It took forever to get anything new into production. So that led us to do lots and lots of fun, unnatural things, probably like everybody else where you'd do a bicycle to a wagon to get it deployed to production just because it's easier to do that than actually get something new deployed. Every very project focused at our shop, too. So if you wanted to get anything done, you had to tie it to a project and then every project would twist the monolith to its own desires and collide with other projects and it was not good. But we had a new opportunity with our next satellite, World View 4. It's going to be launching here in September of this year. Things can change but that's what it's scheduled for now. We got the opportunity from upper management to give us the green light. Like, look, we even were tired of this old system. The cost to twist the old system to meet the needs of a new satellite where, you know, they weren't quite approaching the cost of the satellite but it was getting ridiculous and non-credible. So we got to start over. What are we going to do? So we, as the enterprise architecture team, we knew that building a Paz was going to be key. We had to have something that just let developers do the best work of their life and not worry about, has IT provisioned my VM yet? Has IT done my F5 rule yet? You know, they should be writing lots and lots and lots of business values, what they should be doing. So we surveyed the landscape, created some knockout criteria for a Paz, what we needed it to do at our shop. If it doesn't do these, it can't work here. We bashed that against the list of Paz's that were available. Cloud Foundry was chosen as the leading candidate, so we did a few quick prototypes. We verified what we called these knockout criteria and, you know, did things like pull out DEAs, do rolling upgrades, all that sort of stuff to make sure it never really went down. Ported a couple of different apps, our major languages are Java, Ruby, Python. And we created little sample apps and stuff like that and simple apps and ported a couple just to make sure it would work. Built some staffing and pricing models and went forward. So kind of our path that we found ourselves on is, I know it's a poor graphic, but we had that. The pioneers was what we did. We stood up one team and we called them the pioneer team and their job was to go get bit by rattlesnakes and step in the cactus and figure out how to do all this stuff. It was really good. It was a great exercise. They learned a lot and we just started this kind of learn, fix, adapt cycle. It went really well. They learned a lot and what they learned, we would feedback and fix, make better things that were painful. We'd go back and remove the pain. And then we expanded it to beyond the pioneer team to another team and started developing a little bit more code, a little bit more code. And once you added a few more people, you'd uncover way more problems and, well, I just asked Bob to do that. Well, let's automate Bob now, right? So, we kept learning and learning and learning and adapting and adapting and now we're up to, I think, all developing code with almost all the code targeted to run on Cloud Foundry. We have a few things that don't run well on Cloud Foundry. We'll get into that in a few minutes. But almost all the apps are built and designed to run on Cloud Foundry. So, some of the lessons we're learning is vocabulary is important. So, when you're talking with somebody about developing things and they're describing how they're doing it and you're like, that will never run. What are you talking about? That can't run in Cloud Foundry. You're talking about SAP, right? I'm not gonna do CF push SAP and be done. And they're like, no, no, I'm running that on a VM. Oh, okay, we got wires crossed. So, we come up with this vocabulary of these patterns of apps, right? It's pattern one, that's a 12-factor app. It's gonna be running in Cloud Foundry. Pattern two, that means your VM. Pattern three is bare metal. With all the big pixel data we push around, we have huge HPC clusters that are compute on this imagery. So, we have a lot of need for a lot of bare metal. But just this little nomenclature has caught on and everybody knows. What are you talking about? Pattern one, okay, great. You got a whole context around that and you can just move on. It's kind of funny though that we need some people to run our operations and stuff now. So, they've put out some recs for hire and I've seen in the rec, must be familiar with pattern one, two and three apps. Like, that's us, that's not the industry. No one is gonna know what that means. So, some of the things we learned in these learn, fix, adapt cycles, hopefully our goal here is to help you out and maybe you won't get bit by the same rattlesnakes and step in the same cactus that we did. But microservices sprawl fast. You give the developers the ability to CF push and push fast and push anything. It goes gonzo so fast. And so, we had to, we always had a plan. Yeah, we're gonna do Eureka or we're gonna do console. We're gonna do something for service discovery. But it was a massive forcing function. We had to get that up and running quickly because of just how fast everything sprawled. Centralized configuration. People were pushing apps and some people were configuring through CF ENV variables. Other people were bundling property files into jars and cats and dogs. So, we're like, no, we're gonna go a spring config server. We updated the spring config server to have a Postgres back end. It'd be nice if we could contribute that back to the open source community but it gives a really nice way of everybody attaching to that and grabbing their config. We also learned that API management is hard and I get angry emails just a few minutes ago about API management still. We're learning but we're using, we had a legacy product in-house from software AG. Centricite where you can track your services and who's consuming your services and stuff. So, you have at least some dependency map of who's dependent on what and why. So, that's been helpful. We're also using a tool called Apiary. I don't know if anybody's used Apiary but it's a good design tool on the web. We've bought into it pretty heavily. Makes it nice for testing your APIs and giving them a pretty easy way to do some markdown to define your API. Still learning. We don't have any golden magic sauce here yet but we're learning but that's the tools we're using now and they seem to be doing okay. We even have the software AG's product integrated into our pipeline now so when you first time you hit deploy on an app to get it running through our pipeline it'll check with Centricite to go, oh, I have no clue what this service is. You can't proceed past go until you tell Centricite what it's all about so. We're also learning decoupling code deploy from feature deploy. This is critical to, if people are familiar with it, I'll talk to the people who aren't so familiar with it. It's the zen art I guess of being able to have your continuous delivery pipeline continuously delivering but not actually be turning on new features in production, right? Have a very controlled way to turn on a feature. So the code just keeps flowing like it's supposed to but you can go configure that new feature on in like a user acceptance test environment and they can poke around and test it and like it and then you just, there's no big bang production day. Code's already there, just go turn the nuclear launch key to on and your features live in production. So we're still learning there. We're trying to integrate that. We're trying to use a feature flipper for Java, FF4J. And we're trying to enhance it to use our centralized configuration server. So we have a centralized config of the apps and centralized configuration of all of our features. So that's the end goal of that, we hopefully get there. Some other things we've learned is standard, standard, standards are your friends. Some people will kind of be the Wild West is okay, just let people do what they need to do. But we found a ton of value in having every app. Your slash endpoint is a very standard endpoint in what it returns, who you are, what builds you are. You know, a little bit of information about you. The slash status and slash health check slash status will check your immediate dependencies. So like your database connection or if you have a dependency on a file system or whatever it is, it checks your dependencies like that and a health check checks your remote dependency. So if you depend on a remote service, it'll actually go out and hit the slash status of that remote service. So it gives us a very standard way. When things are deployed, I know I can go to slash, see what it is, go to slash status, see if it's kind of healthy and go to slash health check to see if it's really healthy. So the monitoring teams are integrating all these calls. It's been pretty nice. Our build pipeline, we're actually rebuilding our build pipeline. We tried to do everything in Jenkins, soup to nuts and that didn't work out too well. And we're trying to now choose the right tool for the right job. Jenkins does builds really good, but Excel release from ZBL Labs that orchestrates releases pretty well. So we're doing that now. That's a big effort to redo our pipeline. And one huge landmine, if I can keep you from stepping on something, at least this was big at where I work is, we have a bunch of things like that common, the config server, that's common for everybody to use. We created this notion that there's all these common services out there. Well, wait a minute, I'm writing this, is that, I thought that was supposed to be the one place where the company come to get this. So isn't that common? Well, yeah. Well, that makes everything common, right? You should only write things once. You should only write the function for the enterprise once, not 10 times. So technically, every service is common, right? So we decided that it would have been a much better path to call these config service and things like that, the utility services, right? If the utilities are down, everything's down. But it just caused so much confusion and chaos to call those common services. So what's the current state of digital globe? We have open source cloud foundry running for dev test. We have over 800 services running in there. They're not all unique. Developers are doing their own thing and there might be 20 copies of the same thing out there because they're in the developer space that's supporting their development. And it's kind of interesting fact, I don't know if anybody else has any history on this, but we think that when we're done, we're gonna have between like 60 and 80 microservices for this first kind of big release that's supporting the launch of the satellite. And so we're getting an order of magnitude difference between this is how many is gonna be running in production versus this how many is running just to support all the developers. So it'd be interesting to hear if anybody else has any numbers like that. Our DEAs are just two CPU, 16 giga RAM, 3x over commit on memory. We've found that running that many apps, we just can't scale with a 1x over commit, which just, it was ridiculous. We're integrated with log stack or elk stack for our logging. And we're currently using log drains bound to every app, but we're looking at doing fire hose and we just actually broke the ground on that on last Friday, which makes that a lot easier. And right next door, our friends are talking about that in another meeting. In production, we have PCF running in production on open stack. It's up and running. We have a few services that have been kind of snowflakishly deployed out there because we're rewriting our pipeline and we didn't wanna go back and rewrite the entire pipeline to deploy to production to replace the pipeline. So we just kind of hacked in a couple of things at the end to deploy a few things to production. And it's running pretty good so far. Some of the wins that we have is our development speed. Once we had those patterns down, we could talk about them, the pattern one, two, three. And then for the pattern ones, we created a template app in GitHub where you just basically download the zip file, unzip it, create a new repo and you're on your way. You change the name of the app and it takes care of, it's like fill in here for slash data, slash health check, all those. Great, easy to onboard developers and new team members that way. ESA development, we've been lots and lots and lots of self-service portals. That's been a really, really, really key. If you're thinking about doing this, self-service is key, but one thing that we've learned is your self-service gets out of control and in order to self-service yourself to a new service, you gotta go to 30 different places. So we're gonna have to have like a self-service portal consolidation effort here sometime. The visibility is great in the cloud boundary. What's running? Let me just do a CF curl command to the app's end point and I'll tell you exactly what's running. The monitoring team does that. Once you can see everything, then you can audit everything, right? So as soon as we find new services, we can send off NIMSOFT alerts, who's doing what and why? Why is there new things popping up in production or wherever? And when you get visibility and auditability, you get control of your environment. So this has been huge for us. As you know what's going on, you know what's deployed everywhere and alarms can go off. And the auditability with the NIMSOFT alerts being sent out we're actually automating elk to the point where it'll auto create a new dashboard for that app as soon as it appears. The testing groups are just ecstatic and that's mostly because of microservices. We used to have a monolith where if you wanted to test a delivery of a product, you had to put an order in and wait through the whole thing to go through the Plachinko machine to see if it actually got delivered. And now with microservices, you know, you can just test that, just test delivery on its own. Some of the other big wins is resiliency. Holy cow, we've had compute nodes fail in OpenStack to burn up CPUs and stuff. None of the cloud boundary users even knew, right? Me being one of the admins, I look at it and go why do I have new Bosch VMs? What's going on? Start peeling back the layers, find a big compute nodes that had vanished. None of the cloud boundary people even knew and this happened several times. So we're very happy with the resiliency we've gotten. Some of the challenges we've had is synchronization across foundations. We're doing two regions, if you will. We have two data centers that are geographically diverse, but synchronizing across these is becoming problematic. How do you have one source of truth for UAA when you've got two different foundations? Best practices around load balancing with F5s across foundations are kind of few and far between that we can find freely available. So we're trying to learn here. SSL, one thing we'd really like to do is have a domain per developer so you can be segregated off of your own domain. But just due to the nature of work we do, even in development, we have to do HTTPS everywhere. And as many of you may know, you can only have one cert and you have to have a zillion sans subject alternative names in the cert. So that's just, it's kind of a non-starter to rebuild that massive cert every time you have a new developer start. But so it'd be nice to just be able to point at a list of, here's a bunch of certs, go serve up all these. Open source, no support for HA out of the box. So these, the development environments, we wanted to have some level of resiliency, but we've had to kind of hand roll our own HA, you know, Bosch deployments and stuff like that. And the ECS team have helped us out there. It's great stuff. Developer and DevOps access to spaces. That's been interesting because when you're using these log binders, log stats, drain binders, just sometimes it kind of takes a while we've noticed to get logs out. And when you're first deploying an app and it dies in a test environment, it was hard to get the logs out. So sometimes the developers needed access to these spaces and we didn't want to give every, the fine grained access control just isn't there. To just give them kind of a view. Space auditor wasn't enough, but space developers way too much power to create a snowflake and all that stuff. So that's kind of the things that we're dealing with. And there's some tension too between the microservice architecture and the licensing models from the vendors. There's the tension there. The more times I do the right thing, if I need to break things down from one service to 10, it actually costs me a lot of money to do that no matter which product I'm on. Change is hard, be ever vigilant. The old ways of doing things are gonna come back. You're gonna have a developer that's trying to cram their own custom built Perl into their app. Yes, real story. I need to package this entire Perl distro with my app. What, why? Get somebody in executive management to back you. We've had many times where people just didn't want to do stuff and we've had to play the CXX whatever said so. So I'm really sorry, but you have to get on the train. And for me, I'm on the architecture team. We had to totally put it on the line and fight for this. The organizations had really, really, really resisted change, but putting it out there and putting your badge on the table and this is the penalty if I'm wrong. Take that away from me. That's kind of what we had to do. But it's been great. And I think that's the last one. Just one more that our future needs. We actually have a need for like an OEM style of cloud boundary deployment where I can put it at the customer site and just turn it on for our entire system. I don't think anybody's doing that right now, but if anybody else is thinking that, I'd like to talk to you. Managing multiple foundations as if it were one, that would be sure nice. If anybody's had any insight into that or any open source tools that help with that, we'd love to hear from you. And persisted storage. Oh wait, they just did that. Yay, Diego has it. Awesome. And with that, I'll bring up Steve and he'll talk to you about kind of the day in the life of a developer. Hi, thanks, Mike. So I'm gonna talk to you a little bit about what our delivery pipeline is and how it looks for developers. So when they start off, they have their local dev environment. They'll check in to GitHub, pretty standard stuff, and before they would check in to GitHub, they could use a dev environment. So we created a dev organization. So each one of these boxes here is an organization within Cloud Foundry. So they have their own personal space, their own sandbox they can use, and then they can check in to GitHub, which will trigger a build. So the first, it's kind of the standard build, right initially, where you do your compile, you do your unit testing. Now moving into the microservice world, one thing that we wanted to do is after you did your standard build and created your archive, we wanted to make sure that archive was deployable into Cloud Foundry. So we have this functional test organization. And so after your application passed the unit test, there was an archive. We create a space within the functional test environment for your microservice. So it's a clean space. We deploy that archive into a clean space. And in the functional test environment, the dependent microservices are mocked out. So we want to make sure that that microservice works on its own. It's still working within the infrastructural bounds. So it registers with Eureka. It goes out to the config surface. It uses event service if it needs to. So it'll still work within the bounds of the infrastructure. It'll hook up to a database. But all the dependent services around it are mocked out. So if it passes the functional test that it has, that environment is then torn down again. So we use the resources for the period of time we need to use the resources, then it gets torn down. And if it does fail the functional test, that environment, that space is still left intact so that the developers can come back around and investigate the logs and do some analysis to see why it failed the functional test. So once it passes the functional test, then we move it on into what we're calling the integration test organization. Now the integration test organization is a full up environment. So all the microservices within the ecosystem are in the integration test environment. And so then it runs some integration test against your microservices and makes sure that it works in a full up environment. The integration test environment is also a place where tests will go or development will go. And they'll just do some kind of exploratory tests with their microservices and experiment on what type of test do we want to have in our automated test harness. And then from the integration test environment, we have a bit more of a controlled environment called the regression test environment. Now currently, we actually have a manual gate right here. So somebody has to approve a microservice going from the integration test environment to the regression test environment. But we're still dealing with some cultural issues where there's a group that they want to have control over how things flow into an environment. They want to make sure there's no change happening to this environment while they're running their tests. And we feel that we kind of have to earn the right in order to make that an automated deploy. So over time, we hope we gain the trust of the community to say, yes, it's all working. It all works well. We have this rich automated test harness. Let us just flow into the regression test environment. But currently, this is a manual gate right here. And then so we got open source here in this environment. In our production environment, we have a pivotal cloud foundry that we're working. And as Mike said, this path right here is one that we are just starting to burn in. So each one of these are organizations. So I'm going to dive into what would the spaces look like within each one of these organizations. So we have our organization here. And we have one space we're calling infrastructure. So this is where we're kind of the utility microservices that Mike was talking about. So in that space, we'll have config service, event service. There's a few other services in there that are more utility services. And then we have a space per business domain. So we'll have inventory. And then each one of these, MCS and MPS, they're specific business domains within the satellite arena. Now notice, we have over here to the side, so this would be called our pattern one. And over here, we have our pattern two. So these are services that each one of these microservices could leverage if they wanted. So they're all leveraging Eureka. We have this over here running as a VM. We're using ActiveMQ for our queuing mechanism. We got Elk over here. And then we're using Postgres for our database. So that kind of gives you an overview of what our environment looks like. Digital Globes hiring, ECS is also hiring. So if you're interested in doing this professionally, there's some opportunities for you. And there's contact information. And any questions? If you could step up to the mic or speak loudly, we'll address those now. I think we got one here. Thanks, guys. That great talk today. You mentioned earlier in the session the importance of standards. And yet standards have certainly gone through a lot of evolution over the last five plus years in terms of standard boards and entities versus de facto standards through code. How do you guys navigate between the old and new world? We tried to, we have kind of like a hierarchy of standards generally. We try to go with open standards, industry standards, and then digital globe standards. So we really try, if there is an actual industry open standard out there, we try to use that. Barring that, if there's some satellite industry standard or something in our industry that's fairly standard, we've been using that in the last resort is, let's build at least an internal standard. So when it comes to interoperability and things like that, we're trying to use OAuth2 and trying to latch on to big standards like that when need be. There's a lot of open GIS consortium standards that we leverage to. We're also not that it's too, I don't know what the right word is. It's not too hard of a standard. We're using HAL for all of our JSON responses. So hypermedia application language where you can actually kind of describe your structure in the response. Any other questions? Why'd you say to use Pivotal CF in production as opposed to just rolling it yourself? You'd already done it for your DevTest environment. It's a comfort factor right now. The upper management, the five satellites that are up there that will be up there, they represent billions of dollars of investment. So when something goes wrong, they just didn't want to go open source at the beginning. They wanted to have a big brother to be able to call if something was wrong. So we have other environments that Pivotal will probably be the vendor always at digital globe because of that reason. Any other questions? Well, thanks, everybody, for coming. Appreciate it. It was lunchtime.