 Good morning. I'm Tony Irwin. I guess I was given a little bit of an intro earlier, but I'm going to, I'm the lead architect for the Blue Mix UI, and I'm going to talk about our team's journey from monolith to microservices during this half hour. We'll talk a little bit about the origins of the Blue Mix UI. We'll talk about some of the problems we had with our original monolithic implementation. We'll talk about how microservices helped deal with some of these demons, I guess, and then some new problems that occurred with our microservice architecture. There's already some mention of added complexity and things when you go to microservices. The Blue Mix UI serves as the front-end to Blue Mix, which is IBM's open cloud offering, which features Cloud Foundry as a big part of that. It lets users create and manage Cloud Foundry resources, containers, virtual servers, accounts, billing usage, et cetera. It runs on top of the Blue Mix Paz layer, which is Cloud Foundry. It started as a single-page application. It's the intent to provide a desktop-like experience in the browser, so it's loading all HTML in one page, using JavaScript to manipulate the DOM and all that. The stack was kind of the state-of-the-art, at least in IBM. We were using the Dojo framework on the front-end, so Dojo front-end, single Java server back-end, at least in IBM. At the time, three or four years ago, it was kind of the common stack, but we kind of quickly figured out that it's not really where we wanted to go for our cloud environment. This is just a few random pages from the Blue Mix UI, so it's a fairly big application with a lot of different pages. This is just a very simple picture of our original monolithic architecture. Basically, at the top, the Blue Mix UI, which is the client in this case, which is the browser. Just some different components we had, like the home page and catalog and dashboard. Really just kind of showing all that logic was, at least the UI logic was pretty much loaded onto the front-end. The back-end running in Cloud Foundry was the single Java server, and then the monolith, the Java monolith, would talk to the various back-end APIs like Cloud Foundry, UAA, billing authentication, et cetera. This led to several problems. One was performance. Dojo has a very large Java strip framework, so it was kind of floating this heavyweight Java strip. It kind of led to some bottlenecks. Since it was a single-page application as well, there was really no data we included from the server in the initial payload, so it was all AJAX requests, which just again kind of bogged things down. It's also difficult to integrate code from other teams, and this will be a theme as I get deeper into the talk, but we wanted to make a flexible framework, because at IBM, I've got my core team, but there's 20 other teams that want to also kind of be part of this console framework, and asking everyone to somehow write a Dojo plug-in to get into what was our monolithic app really wasn't practical. Some of this, if you dealt with microservices, you already know, but with a monolith, when you have an update, you have to push the whole thing as opposed to being able to push a smaller part of the system, so that was an issue. Poor SEO, so search engine optimization was poor, because as I mentioned, the HTML payload really didn't include any searchable data, and the last point there, and I don't know how many people in the room have ever used Dojo, but at the time, we had some new hires come in, and they had wanted nothing to do with it, so rightfully so wanted to move to a little bit more modern, lighter weight infrastructure. So we decided to go to microservices, and these are some of the advantages or things we thought would help us deal with some of the problems with our monolith. You know, one issue we had is that we were a live production product. We wanted to totally re-architect it to microservices, but obviously that takes time, especially when you've got pressures to continue to add new function as well. So we went down the path that we could slowly break this monolith down into microservices, kind of continue down this path of a lighter weight stack, but keeping the core monolithic aspect of the product alive as we broke things apart. We tried to go for smaller services that were optimized for speed and page size, so we went with lighter UI frameworks and services that were focused on a smaller subset of things than the monolith as a whole. Microservices also help increase developer productivity. There's less chance of breaking other parts of the product, because you can deploy individual microservices. Loosely coupled services can deploy at their own schedule, so I kind of alluded to this before, that we have all these other teams that want to plug in. They don't want to wait for one big push. They want to update as they have changes. Microservices is a little bit fuzzy. Not really microservices that allowed us to improve our SEO, but just as part of this re-architecture. We started doing more server-side templating, including some more content in the page, in individual pages rather than having one big monolithic single-page app. So this had more content than that Google and other search engines could crawl. I'll get into this a little bit deeper shortly as well, but as the core team worked and we had other teams plugging in, we wanted to make it look like it was all part of one product. So we used some microservice composition so that we could all share some common UI elements and things like that. So this diagram basically shows our microservice pattern. Our typical UI microservice is going to be implemented this way. Again, we have the Bluemix UI client at the top. We're all the HTML, CSS, JavaScript is. The JavaScript may be vanilla, could be Polymer, React, Angular. We've been kind of focusing on React more often than not within our product. I always forget to talk about the proxy piece of this, the gray box there. The proxy is really what makes the microservice system come together because as requests for a particular path come in, the proxy decides what microservice to send the request to. So this Node.js microservice in the middle, the green box might be the catalog or catalog microservice, for example. So when slash catalog comes in, the request is forwarded on to this microservice. So as I alluded to, all of our new microservices are in Node.js. We're using Dust.js for server-side templating. A UI microservice needs to call our common header microservice, which this was alluded to before. It provides an API to get the header at the top of the page so that all pages can at least share that piece. And these microservices may also call other API microservices or certainly the various back-end APIs. We're also using Redis for shared session storage so you authenticate and then these UI microservices can grab tokens out of the session. And this is a visual depiction of how we compose the pages. So the green box is just a microservice. It does invoke common, and you see the strip at the top is basically what we call the header. So with the server-side templating, the microservice can take that HTML snippet, put it into its overall HTML payload and serve that, and then you've got a header with user information in it, as well as whatever content the page wants to provide. So this whole process for us started a couple years ago, really, and we feel like we're, and I'll show more, kind of the stages of our progression. But we're pretty much where we want to be, still not, you never quite get everything exactly the way you want it to be. But so back in February of 2015 was the first release we had that had any microservices at all. And as I mentioned before, we wanted to start kind of slowly. We started with Home and Solutions. We're kind of two smaller pieces of the monolith. We thought, well, let's just make those microservices to start with. So you can kind of see, so those green boxes moved out of the top, became microservices on the back end. The Java, the Bluemix UI server, which is our Java app, still running there. We've got the proxy, which I mentioned, which is routing requests to the various apps as needed. It's all deployed to Cloud Foundry. I was at a conference in Shanghai a couple weeks ago, too. And that was more of an academic conference. So folks are very interested in research around how you break a monolith down into microservices. And my view is it's more of a black art than a science at this point. We started with a couple smaller components and slowly, things that seem to logically kind of go together. But some of the academics of the conference were like, well, could you assign a score to your architecture and to rate how good it is? No one had an answer to that at that conference. I think that's a very difficult thing. Beauty is in the eye of the beholder, I think, in some of these microservice breakdowns. So as you know, there's lots of different ways to slice and dice. Phase two was about a year later. And, you know, I'd say 90% of our migration was done at that time. The account stuff was still kind of mostly on the client side. You see a lot more green boxes as that UI logic moved down into separate microservices. Our Java server is still there, but it's doing less and less. It's just, by this point, it's just serving some APIs that we hadn't yet ported to Node. And our end goal is, you know, to get rid of the Java server. We do technically still have it there. We would like to, you know, just for team consistency to be all on Node.js. So that's something we still want to get rid of. But we're essentially at this point now, except for some remnants of the Java server laying around. I mentioned, you know, plugins and other teams wanting to be able to, you know, plug in, be part of the console, and deploy on their own schedule. And this diagram is intended to show that. So we talked about how the proxy routes requests to our individual microservices. Other teams here, the yellow boxes on the, I guess it's my left, or whatever. So we've got, like, Watson, IOT, OpenWisk, various components that other teams own, you know, it's like slash Watson would route all requests to the Watson endpoint that they've provided. Now, that may be in a microservice on its own right, or maybe a proxy to other microservices. We really don't care. We've got about 25, I think, core microservices or so. And with all of our, you know, I think, I guess this slide says 15 teams. I think it's probably actually closer to 20 now teams that have plugged in within IBM to this infrastructure. And, you know, they all may have, you know, a handful of microservices. So, you know, it's a very loosely coupled system, you know, maybe 100 microservices when you add up all of the teams involved. So when you move to microservices, you know, I guess the saying is there's no free lunch, right? So there's always, you know, sometimes you trade one set of problems for another. I mean, it has been a good move for us, but, you know, it does bring some added cost. So there's more moving parts, you know, I mentioned we have 25, 30 microservices plus all the plug-ins. So there's, you know, more moving parts, more complexity. The build pipeline becomes all the more important, you know, to be able to orchestrate deploying that many microservices. Federated status, monitoring, I'll have another slide that will go into that in more detail. But I think this was something we really underestimated when we started, how important it is when you have microservices, you know, all these loosely coupled things, be able to monitor, you know, problem comes at 2 a.m., how do you figure out what went wrong, you know, or if you have a performance bottleneck, how do you figure out what component is causing that issue? So we've invested a fair amount into monitoring, and we'll talk a little bit more about that briefly. The granularity of microservices versus memory allocation. So as I mentioned, there's more art than science. I think when you're breaking down a monolith into microservices, but when you're deploying, you know, one consideration, though, I've heard people point the term nano services. We did not want to get to the point where we've got, you know, a thousand different microservices, you know, so we've got more than 25 to 30. But if you look at, like, in Cloud Foundry, as you know, you have to allocate memory per instance up front. So we did notice a significant increase in memory usage by doing this. So our Java app, our single Java app was, you know, three instances of two gigabytes apiece. So we were using roughly six gigabytes allocated to those Java instances with our microservice system. And, you know, this was some math I did when we had, like, 27 apps, you know, about 95 instances. Even if those are, you know, 512 to a gigabyte apiece, you know, that adds up to, you know, 55 and a half gigabytes. So that's, you know, far more memory than the monolith took. There were some issues just trying to, as I mentioned, we needed to keep that monolith running. So we had some issues just, you know, trying to have some seamless navigation between the single-page app and the individual pages. Blue-green deployments. There's, you know, doing a blue-green deployment in Cloud Foundry is easy. You know, with one app, when you have 30, what do you do? We ended up doing a blue-green deployment at the proxy layer. So we deploy, you know, one set of 30 microservices and another set, and we would do blue-green at the proxy layer. Promoting uniformity and consistency, that this is, you know, if something keeps me up at night, sometimes this is at IBM. We want, you know, these individual teams to be able to plug in and have freedom to deploy on schedules and everything. But if you want to try to, I have a product with a consistent UI experience, you know, what sort of policing do you put in place and quality standards when you've got other teams plugging in? I'm not sure we've totally nailed that one down yet, but that is a concern in our case. And geo-load balancing and failover, not really required to do microservices, but, you know, we had invested all the time, and I have another slide dedicated to this shortly, but we had, you know, invested all the time in trying to get HA and resiliency for our microservice system. But, you know, then if you're running in just one data center, that data center goes down. There's not much you can do. So we did undertake some efforts to be able to load balance between different Cloud Foundry deployments. So I mentioned monitoring, just a little bit more detail here. Lots of things can go wrong when you've got this many microservices, and, you know, they're all talking to various back-end APIs that can have problems. Cloud Foundry can have problems. There can be networking issues. You know, how do you figure out what's wrong? So we did build a monitoring system. You know, some needed metrics or metrics that have helped us along the way. So for all of our microservices, we have all inbound and outbound HTTP requests, so with response times and error codes, and we've got, we all, we pop it into Grafana. It's a little example of Grafana chart at the bottom. And, you know, when you start seeing various components, returning a bunch of 500 errors just as an example, you know, you might see a big bump in red. Red would indicate 500 errors here. There's not a lot of red here, but if you see a big red bump, you know, there's probably a problem. We also cared about memory usage and CPU usage and uptime for every microservice. So, you know, we keep track of app crashes, all those kinds of things. So if your app is crashing, your quality of service is probably going to be impacted. You know, general health of ourselves and our dependencies. As I mentioned before, as an example, we use Redis for shared session storage. So we do want to, so one of the things we do is keep track of how healthy our Redis system is. You know, if that starts to have issues, that's something we need to resolve quickly. And aside from kind of the real monitoring of real data, we do also run some site speed I.O. We use site speed I.O. to generate some synthetic page loads so we can look at front-end performance as well. This slide, I alluded to the global console that we have before or that we just recently released. We used to have, so we deploy in four, we have four public cloud foundry deployments at IBM as part of LUMIX to Dallas, London, Sydney, and Frankfurt. We used to have individual URLs for each of those regions. These were very separate deployments. We, with the global console, as we called it, we have one URL now, consolebloomix.net. We do have a region selector, so if you wanted to create cloud foundry apps in Frankfurt or Sydney, we still have a switcher for that. But instead of doing a whole page reload with a new URL, it's more of a filter in place. So we use, I haven't really talked about Akamai, but Akamai, we use Akamai here in our picture, but the important part for load balancing is that it does a DNS lookup against a dime load balancer which basically has all the IP addresses of our different cloud foundry deployments. So it's going to return the one, the IP address for the deployment that is closest to you geographically. So if I'm in Sydney, hopefully I would get the UI server from the Sydney data center rather than the US data center. So that's an improvement in performance right there. But the other thing is it looks for, it returns the nearest healthy data center. So if you're in Sydney, you would normally go to the Sydney data center. If that data center goes down, you would be routed to the next closest one, which maybe Frankfurt, say. So you don't notice any interruption in service, and we can go figure out what's wrong in Austin. Or in, not Austin, Australia. Yeah, and that's basically that. So the odds of all of four regions being down at once are pretty rare compared certainly to one region being down. And that kind of brings us to the end. Any, I think we've got time for questions? Yeah, there's two. That's a good question. I don't know that we've been, I think we've probably have built up some technical debt in terms of some dead code being in the Java piece. I think in some cases as we sort of ported things, we were probably better about deleting that code than other times. So this is going to be a, we do have a goal, as I mentioned, to knock out that Java server eventually. So I think we will have a little bit of a challenge just to, we'll be able to see what API calls we make into it so we'll know what's still being used. But as you're in sort of a porting exercise, there'll be pieces of Java code that we don't want to port at this point because they're not, maybe they're already been ported or no longer used. So I don't know if that answers your question. The proxy that you mentioned, what does that exactly do over just routing to a certain URL in your internal? Really nothing else. We tried to keep that layer very, it's actually a Node.js app in Cloud Foundry. We probably will be moving to Engine X, but there's really no intelligence other than looking at the host or the path and routing to the right microserver. So why don't you use the routing option in Cloud Foundry, the default? Well, I think we started this before that was even an option. So it's probably one of the big reasons. Yeah, so I've heard arguments that UIs need to be monoliths because you want to guarantee a seamless user experience between the different UI components. How did your UI design process change as you split up your UI into really separate microservices? I don't think from a user experience design perspective, I don't think it changed a whole lot. I mean we do have more times where you click on a button and you get a full page reload as opposed to just some DOM updates occurring, but I don't think that really impacted our design approach that much. Now there are cases where the UI designers will, I guess, propose something that doesn't necessarily fit real well with just how our code is broken up. So that could be a challenge, but I guess I've typically thought, well, let's not make an architectural decision that we had made at one point impact our user experience too much. So if there's some swizzling we need to do or extension points or whatever to make some of those things happen, then we need to be open to that. You mentioned that you do blue-green deployments via the proxy, so you are deploying all of your microservices, blue-green, even if you only changed one, right? That's right. That's a very astute observation, so that is. We've taken some steps to improve that, so we do have the ability to swap in individual microservices, but it's not as good as it could be. Yeah, so that's a shortcoming we have, because I think we've gotten smarter about at least being able to update the proxy config of our own deck deployment to point at a new version of a microservice, so we may still be doing a blue-green deployment at the proxy, but we haven't actually deployed all new apps to do that. How many times do you deploy? How many times? Yeah, I wouldn't say we're, I know some people like to deploy multiple times a day. We're more maybe a couple times a week, typically. I guess there's a few reasons for that. One is that just when you're making UI updates, you usually do need some, we have automated tests and stuff, but sometimes you still need that visual inspection to make sure nothing looks too eschewed, so there is some testing overhead and stuff when you're doing UI. Thank you. The room goes way back there. I can barely see you. Hi there. So obviously individual teams will have responsibility for testing their own microservices, but in terms of the product as a whole, where's that responsibility lie and what sort of challenges would you have experienced? I'm sorry, in terms of testing? Yeah, in terms of testing the whole product as a whole, I mean, individual teams will have responsibility for their own pieces, but who owns the overall piece? Right, yeah, and that's a real challenge. I mean, so I mentioned that we have automated tests, so each microservice owner is certainly responsible for having a nice automated set of unit tests. It's a little bit more challenging when you then want to make test end-to-end flows, and we do have some automated testing around that, you know, testing the, you know, if you transition from one, a page would, you know, serve from one microservice transitioning to another. I mean, we do have some tests around that, but those are typically stored separately from the microservices, and I think we still need to do a better job there. We have a couple QA folks that would love it if we required a little bit less manual testing than we do today. Hey, my question was you compared the Java memory footprint and the microservices footprint. Is that around the same functionality included in these two types, or was there also functionality-wise a difference? No, the functionality was, I mean, we probably added some new function, but by and large, it was, you know, equivalent functionality. Okay, thanks. You're getting a workout. My question is you showed the different data centers of the installation. How did you manage to synchronize the state between the data and the data centers? That's a very good question. So, our UI code doesn't really store a lot of state. The, you know, we really rely on back-end APIs. So, just as an example, you know, the UI will allow you to create a Cloud Foundry application. That state is maintained in the Cloud Foundry, you know, by the API controller right, and all of its back-end. The one thing we do store is like the user token in the Redis session. So, if we do do a failover, we do have to do a quick refresh of, we've got some cookies and things, we can do a quick refresh of the token, and we end up, because our Redis deployments are separate. So, Redis is used for that in the back-end. Exactly. Yeah, yeah. So, we don't necessarily copy everything from a Redis here to a Redis there, but since we don't store very much in there to start with, you know, we just get the new token and put it in the Redis when we failover occurs. Thanks. Monolith, yeah, they tend to have an exciting internal structure as well, right? There might be some layering going on and stuff like that. So, in your case, the decomposition, did it just fall into clean top-to-bottom slices, or was there also some sort of downstream services happening, and if so, how did you cope with that? Yeah, that's a good question. I mean, I think, you know, because, you know, I showed the screenshot earlier of the different kind of sections of our UI, and typically the APIs and things provided by our... So, our monolith didn't really provide a whole lot of UI. It just kind of served, you know, all the JavaScript and HTML, and, you know, we had to make back-end API calls. So, the Java server ended up becoming just an API server. Now, some of the APIs tended to be kind of, you know, so an API for Cloud Foundry, you know, that was doing some Cloud Foundry manipulation like for creating apps, was probably only used by our catalog microservice. So, we did sort of look at the various pieces of the UI, catalog, dashboard, accounts and billing, and those were roughly, you know, some of the pieces in our monolithic code as well. So, I guess, you know, that was somewhat natural. I can't say we went and looked at the, you know, the class hierarchy of our Java app and used that to sort of drive the microservices we broke down. They were probably bigger components than individual Java classes and things like that. I don't know if that answers the question. Then thanks, Tony. Good talk. Good questions, and we ran out of time, so we have to switch over to this one. All right. Thank you.