 So in addition to coming out here and speaking at Maisel's Con, I also get to work with partners like Maisel Sphere on cool new things to do with JFrog's products, as well as work a lot with customers on how to do DevOps with the JFrog tool suite. So a lot of interesting different perspectives I get in my job. And today, as the title of the talk suggests, I want to talk kind of about sort of a mix of that of kind of what was the experience of some of our R&D efforts on the one hand and versus what we expected from R&D versus kind of what happened as we got out there in the real world. So to think about that, you know, we start with the core concept, microservices are cool. If I need to explain what a microservice is, you're probably at the wrong conference. So I'm going to skip that part of things. But if you do have questions about microservices and why they're cool, I'd be more than happy to talk about it afterwards. But a lot of the time when we think about microservices, we tend to think about microservices in a SaaS environment. And if I now have to provide instructions for deploying your microservices environment, well, how many of you have a microservices environment out there? And how many of you would like to try to explain to a customer how to get it set up and running? Yes, that's what I thought. No hands raised. So, you know, this is something that we've been spending for the last year, you know, really putting some thought into and trying to figure out. So first, what's the big difference between SaaS and on-premises? Well, the big difference is ownership. With a SaaS environment, I understand my target environment quite well. I own my target environment. I can change my target environment as needed. I have direct access to the developers if they need to change code to conform to some aspect of the target environment. Each service can be individually owned by a DevOps team that takes the service from development all the way to production and owns it for the entire lifecycle. Whereas, on-premises, the customer owns everything. Their environment has their weird quirks, whatever they are. The customer has to be able to take ownership. The only communication the customer is going to have with the developers is going to be mediated through customer support. It's a very different scenario and it's been interesting. So, I'm going to talk a little bit about three different products and, you know, where we're kind of at. So, the first of the products that I want to talk about is JFrog Artifactory. JFrog Artifactory is the premier universal binary repository manager out there on the market. Although, for the purposes of this discussion, perhaps the more salient point about it is that it's, you know, a 10-year-old monolithic web application architecture running on Tomcat that's been deployed to thousands of customers worldwide and tens of thousands, even hundreds of thousands, if you consider the open-source distribution. We also have JFrog Bintray, microservices SaaS application. It does software distribution. If you've used Jcenter, maybe doing Android or Gradle development or other types of Java development or if you've used Homebrew, you've leveraged downloads from JFrog Bintray, about 2 billion downloads a month, very scalable, very microservices-y. Been operating for about five years in the future. We're going to want to take this on-premises. And the other project I want to talk about is JFrog X-ray. Fairly new project. It came out in 2016. It was designed from the start for this idea of on-premises microservices. And we'll talk kind of about how in the first year and a half of its evolution, what have we learned from that experience. So we'll talk first about Artifactory migrating that legacy monolith into a containerization, into orchestration systems. Then we'll talk about the X-ray experience, what we did over the last year, and how are we taking that forward into moving Bintray on-premises. So let's start here with Artifactory. So this is the architecture of Artifactory that I started with several years ago. And it's a fairly solid architecture. Out front, you have a load balancer. On the back end, you have a shared storage layer of a database and some sort of file storage mechanism, such as NFS. And in between, you have very, very nearly stateless Artifactory nodes. We thought that this was pretty much pre-made and ready to go for a cloud-native application. So the very first attempt we were doing back in 2015 involved work with VM orchestration, with one of the major VM orchestration pass providers. And we learned a lot of interesting things about our product in this first attempt, and a lot about separating the application layer and the configuration layer. So a real example of this is the health check. Pretty basic function of any sort of orchestration platform. You want to have a health check. Artifactory has long had a really great health check, API system ping. It goes out, it tests network connectivity, but it also does some basic health check functions on the application itself. The problem was it required a real user. Now, if your Artifactory has, as many do, anonymous access enabled, you didn't have to put any credentials in. But if somebody wanted to lock down the Artifactory and turn it off all on anonymous access, then you needed to insert credentials. And this wasn't a problem in our SAS edition where we have a backdoor account that can do things like this on the systems. And it's not a problem on premises where you own your architecture, so you know where you have to go. If you change the credentials of a certain service account, you know all the places you have to go to make the changes. But you move into an orchestration layer where the health checks are usually buried pretty deep into the code. And unless you want to have a whole bunch of extra configuration that a user has to do when they do things, things can go badly wrong. I discovered this bug when I was doing a test. I happened to need to force authentication on something, so I shut off anonymous access, and suddenly my services all went down. Because what does an orchestration layer do when you fail the health check? It takes it down and tries to restart it. And it kept trying to restart it, it kept trying to restart it, and I was kicking myself for an hour trying to figure out what the heck had gone wrong. And then I realized, oh, the reason it keeps restarting the service is that it gets to the health check and the health check fails. And so we had to make a way to access the health check that was 100% independent of user credentials. I'm going to give a lot of fairly specific examples in this because I want to tell you the real pains that we suffered. I'm hoping that you can generalize these examples on your own to things in your own use case that you may want to think about. So this was the first thing, the first major lesson that we learned. So after we'd done the VM orchestration, the world evolved a little bit, and we were looking at startup scripts. The health check was a continuing problem in the sense that when do you start running the health check? It's a real health check, so until the service is up, it doesn't return okay. If the service takes longer than usual to start, what does that mean? Either I have to have some sort of very complex startup script that tries to identify when is the system actually up and somehow then wait to trigger the health check until that happens, or I just wait long enough. Both of those answers have problems, and to be honest, we still don't have a really good satisfactory solution to that for the Artifactory application. In some of our orchestration solutions that we've built over the years for Artifactory, we just wait. In other ones, we try the complex script approach. We've run into problems both ways, and we still haven't really settled on what we think the best practice is on this one. Yes. So there's nothing wrong in Artifactory. Artifactory can fail its health check, and it'll sit there, right? But generally speaking, if you fail the health check in Artifactory, it means that a user will also fail to do a right, and in some cases a read. So startup scripts are something that have kind of been a recurring theme, and we'll talk some more about them as time goes on. It's something you really have to put some real thought into. The startup scripts in general for our containerized versions of our applications and for orchestrating those containers tend to be more complex than when we were expecting somebody to run the startup commands from the command line. So then we containerized it. We had an RPM install for a while. The very first container we ever wrote, we just installed the RPM in the container. Everything was good. But back then, at least, when we went to then make that container highly available, we ran into problems that our highly available structure typically had a slightly different directory structure to manage the shared storage. You ended up needing NFS, which you can't really easily make NFS in containers unless you've got that built out ahead of time. Anybody who wanted to do HA basically had to take our default image and build on top of it, which is indeed what we did when we did our first Mezos implementation. So Mezos was the first platform that we attempted to deploy containerized artifact through orchestration with self-healing. We'd done it before, as I said, with some VM orchestration tools, but the Mezos really took it to the next level where you have to get much more serious about being cattle instead of pets. Again, thank you to the Mezosphere team for assisting us with that. And of course, most of the self-healing aspects of it and the orchestration, DCOS and Marathon took care of. It just works. Service goes down, it checks its health check, it'll restart itself. I want to scale it horizontally, it just goes and does it. I need a database, several great database services out there for Mezosphere. The problems we had were twofold. The first was that the user had to supply an external NFS mount. At this time in Artifactory's architecture, there was absolutely no way to start highly available Artifactory without NFS. You didn't need very much NFS space, but certain configuration files had to be shared on NFS and certain caches had to be shared on NFS. The other big issue, Artifactory is paid software in highly available mode, and we do licensing per running JVM. And so, you know, the tool has things built into it to identify, have I seen this license before? Is this license being used correctly, etc.? And the expectation was that you would easily be able to allocate a license to a node and that you would understand that that license belonged to that node for all time and you would know, you know, how to replace that node, etc. And that wasn't true in Mezos, where node IDs were changing regularly. You know, whenever it regenerated a node, I got a different node ID, different location. So we had to do some fairly extensive hacking to the management of licenses in order to get it working. And again, I want to stress, we started with effectively a stateless application on top of shared storage. You know, it wasn't that the core architecture of Artifactory was the problem here, just little tiny implementation decisions we'd made along the way that we hadn't thought about what it means when you try to deploy something as cattle instead of a pat. So this caused Artifactory 5. So we had been on Artifactory 3 and Artifactory 4 before this. We had to do a new major version of Artifactory to re-architect the system behind the scenes to address these concerns. Probably the very first time that we re-architected Artifactory, not because of performance issues or because of, you know, any sort of issue in terms of how it was actually functioning, but purely to address infrastructure-related concerns. So Artifactory 5 is cloud native ready. What do we have to do for it? Well, we addressed the anonymous ping issue quite some time ago, but for this, the very first thing we did is we had to make it so you no longer needed shared storage for the config. So the nodes had to be able to retrieve their configuration information and share it. You know, somebody made a change to configuration that had to be able to be shared between the nodes more easily. That involved creating a slightly more extensive crosstalk between the nodes so that those config files could be moved from one to the other and so that they could be, and so that cached artifacts could be accessed from another node. No more NFS required. Made a lot of customers really happy because a lot of mesos implementations in the world have no access to NFS at all. And the other big thing we did is we shifted the license management so that rather than licenses being managed where each node was expected to manage its own license, the cluster managed the licenses holistically. So that, you know, you put in a set of licenses built into the cluster and then it just allocates them as nodes go up and down. It gives out those licenses accordingly. Simple stuff in some ways, but needed. And so today, while it's not a master slave, it's a full active, active HA, but there is still a primary node that has some special things that's still a little bit pet like. For the most part, we've got kind of a cloud native cattle type deployment that works for Artifactory, for mesos, for Kubernetes, for Docker swarm. And we've managed to break the first microservice out of this monolith. We call it JFrog access. It manages permissions and authentication. So we've started to be able to break the monolith, which was the other big goal we had with Artifactory 5. We didn't just want to become cloud native. We wanted to set the architectural framework so we could start breaking our monolith up slowly but surely in a way that would be nice. Now, it's still not separate containers. It's two monolith, two war files in the same Tomcat for now. But, you know, as our architect, chief architect puts it, you know, it's conceptually completely separate. You know, they communicate with each other via REST APIs, that sort of thing. If it's even though it's not yet deployment separate. So that was one journey. It took a couple of years. We're fairly satisfied. It's not to say we don't have further to go. We're always working on streamlining this, taking these lessons learned, putting it back in. Artifactory was in some ways the hardest because we couldn't afford for anything to go wrong. Because, you know, thousands of customers wanting to update fairly regularly. We make it too hard for them to do that. And it's going to be very, very painful for us. So, now I want to go back and talk a little bit about the X-ray journey. JFrog X-ray went GA just over a year ago. And, of course, it had been in development for about six to nine months before that. So, basically, we started it basically at the beginning of 2016. And this one, you know, we thought to ourselves, okay, you know, we're creating a green field project. We want to start from the beginning with a microservices architecture. There's lots of reasons to go with microservices. We know that there's risks to deploying it on premises. But with the advent of Docker and containers and other things, we think that this is something we can do now. And this is the base architecture of X-ray. It's eight microservices plus one microservice that's used only during installation and upgrade. Three of them are kind of standard services, you know, a rabbit, which does event queuing, a Mongo and a Postgres for persistence of various aspects of the data that X-ray uses. And, you know, one web service that's the front end and then four worker services. And we deployed it through Docker. Seemed pretty logical. And we were fairly happy with it. It had all of the growing pains you would expect with version 1.0 software. And there was lots of work, of course, to do to improve it, bring in new features, fix bugs, et cetera, as it came out. But on the deployment side, we were actually a little bit surprised. We architected this thing very carefully. We thought we were good. And the first thing that happened is our customer said, do you have an RPM for that? Yeah. So X-ray goes to, you know, one of its primary use cases is security and compliance. The customers that are most interested in security and compliance tend to be the most conservative on their deployment architectures. And very, very few of them were willing to use Docker containers in production, period. This is something that's beginning to change. You know, where we were at the, you know, in mid-2016 is not where we are as we move into Q4 of 2017 on this subject. But even today, you know, there's a lot of customers where you say, oh, I want to give you a deployment based on Docker. And I want you to put that in production. And they look at you and say, yeah, well, someday we'll want to be able to do that. But not today. So the first thing we've come to understand is that at least for now, you know, if you say you're only going to deploy on Docker, that's going to severely limit who can use your software. Or at least what environments they can use it in. So we ended up putting out RPM and Debian services that you could use. The second lesson is a little bit more embarrassing, honestly speaking. And so the second lesson is really just start like you mean to go on. It's not entirely clear, but hopefully when you kind of looked at that architecture, you know, it was an event-based worker queue, producer-consumer queueing, very scalable horizontally. Should be very easy to go, highly available. But, you know, for the 1-0 release in the interest of simplicity, we focused on I want to be able to deploy it with Docker Compose to one server. And it took us most of a year to find all of the things that, you know, based on that, that that was our only testing infrastructure, to find all of the places where we'd made assumptions about how the file system worked, about what the IPs were, you know, just little places where a developer had hard-coded something they should not have hard-coded. And it always worked because it was always deployed in exactly the same way. So this problem, you know, we should have known better. We're a company that advocates for dev-ups for our customers, for ourselves. And basically what happened here is that we let the developers go off without really talking about what the deployment architecture would really look like and, you know, how to deploy it as being a primary concern from the beginning. And it's cost us. And it's something that we probably aren't going to make that mistake again. And kind of along the same lines, it's about flexibility. On a deployment side, every fourth customer for a very long time, maybe faster than that at the very beginning, was asking for more flexibility in their deployment architecture than we'd originally planned. Whether that was, hey, I see you have a post-grass container. My policy says that I'm only allowed to use the official BLESS version of post-grass at my company. I can't use your post-grass container. I need to use my own post-grass database. Or I need to specify custom paths because by policy, this partition in the file system can only be one gigabyte to prevent people from storing certain types of data on it. And you're storing those types of data you're not supposed to store. I need to be able to store that data in a different path on my file system. Or I don't have access to the web. I can't upgrade my Docker containers because I can't do Docker pull to your main Docker registry. I need to be able to pull your containers down to a private Docker registry and manage the services that way. That one's particularly embarrassing for us in the sense that we're one of the primary providers of private Docker registries out there. And we hadn't considered the need for people to be able to use it. But no, I mean, flexibility in deployment architectures. Again, it comes down to that fact you don't control the customer environment. And customers have really, really weird requirements sometimes that made sense 20 years ago. And they will even admit to you make no sense today. But, you know, if they have to figure out a way to change them, you know, deploying your software is going to go from something that, you know, they should be able to do in a day to something that takes a year's worth of paperwork. So finally, with X-Ray, I wanted to get back to the question of startup scripts. So with Artifactory, there were a lot of kind of complex factors in the startup script. X-Ray was built as a microservices architecture from the start. So the services are truly completely stateless between each other. There's no dependencies for one, you know, there's no requirement that one service be up in order for another one to start or anything like that. Despite that, when we actually run our official, you know, X-Ray start script, we actually do before we start each service, confirm the dependencies are there that allow that service to operate fully. We went back and forth on this a lot. Did we want to do this or not? And we looked around, we tried to do some investigation, because effectively what we're doing is introducing state into our startup script that theoretically doesn't need to be there. And you might ask, why on earth would you do that? And the answer is that our customers mostly came from a role to deploying monolithic applications. And they're accustomed to an idea that when the startup script finishes and says this application is started, they can start using it. And so we deployed this state basically to make it easier for a customer to understand what was going wrong, if anything, when they tried to start the system. It was an interesting decision. Like I said, it's a matter of, at some level, it's a matter of personal taste, but we felt based on what we were seeing that it made it easier for the customer to understand what was happening at deploy time and to track how the architecture really worked. And so you have to kind of, I guess the real lesson here isn't really so much about startup scripts, it's about trying to make it easier for the customer's head space to track this system. So, Artifactory and X-ray moving forward. Bintray on premises. And in fact, an entirely new initiative that goes with that we call the JFrog platform. So what do I mean by that? Well, this is the architecture of the platform. And the numbers that you see next to each name are how many microservices I think are in each one. And it's a little bit indeterminate for some of these and this is more or less the count as it stands today or as it's currently projected. So JFrog access is going to be one service. Artifactory will probably be around two services by the time this is deployed. JFrog X-ray is currently nine. JFrog mission control is looking like it's going to be about five. And JFrog Bintray today is 25 or more. And that 25 or more is a pretty scary number. Again, this is a SaaS architecture. It runs in about 60 containers and service options globally. And as we're looking to move Bintray on premises as well as provide sort of this integrated deployment architecture to deploy all of the tools together, we really are thinking carefully about what have we learned over the last several years from the Artifactory journey, from the X-ray journey, and how are we going to carry that forward into the next one? So what are we going to do? So the first thing that we've decided, which hopefully isn't a surprise to anybody, is that simple is better. When we started with X-ray in particular, figuring out how do you debug a set of 30 microservices when something, a set of even eight microservices when something goes wrong, debug remotely again by customer support, not able to go in and instrument the service directly whenever you want to. The idea of trying to do that for 40 services at once scares us. So on the one hand, we want to try to consolidate our infrastructure services as best we can at a given location. And also, while we want to keep with small services so that you have that scalability, you have that developer flexibility that microservices provide, we also want to pay attention to proliferating services. It's not like a SaaS environment where it's actually cheaper to proliferate a new service in some ways than it is to consolidate old ones. So it's a little bit of a violation of the pure microservices architecture. I'm almost calling it more the small services rather than the microservices architecture. But we think it's going to be important at least until the market matures more on how you do this sort of thing. The second thing is we are going to start with that enterprise architecture deployment. We're going to plan for scalability and flexibility not just from an architectural perspective, which we did quite well with X-ray. I mean, the architecture was never broken for scalability and flexibility, just the deployment model. So from the beginning, we're going to deploy it and test it with the idea that that's our target environment and try to build in some of the different types of customer environments that we've learned as we do this. And as part of that, we're actually going to start with container orchestration implementation because what we've learned from Artifactory is that the container orchestration implementation imposes the most discipline on you on making sure that you've actually created proper cattle services. You know, there's a lot of cheats you can do with pretty much any other implementation that proper containerization at least doesn't make impossible. You can always cheat the system if you want to, but it makes it much harder. So we're going to start with a container orchestration mechanism as we do this, but we're also bearing in mind from the beginning we're probably not going to be able to end there. We're probably going to have to be able to build, you know, platform native distributions for CentOS and the various Debian flavored distributions. You know, RPM and Debian releases are pretty much a must still and, you know, possibly even a Windows one. And, you know, that's going to be a bit of a challenge. But like I said, we're going to start with container orchestration because we consider that to impose the right architectural discipline on us. So finally, as my final takeaway, it's really one word. Hopefully it's a word that you've heard before DevOps. So DevOps means a little bit different thing when you're talking about on-premises software because you're no longer talking about owning a service in production. By definition you can't own a service in a customer-owned environment. But it does mean that the people that are responsible for managing, packaging and deployment of the system and testing sort of the deployment side of things need to be working very closely with the developers from the beginning. And the values of DevOps remain, the value of ownership of all aspects of the architecture remain throughout the system. So yeah, hopefully you found this interesting. I'd love to be happy to take any questions you have. Looks like I've got about five minutes for questions. Oh, and by the way, also if you want to come work on this sort of lovely stuff, JFrog is hiring. Feel free to talk to me about that afterwards as well. Any questions? Yes? You're doing some configuration for things like paths or anything else. How are you guys managing those as inputs from these customer environments? I've seen this isn't like a Docker file change or something like that. No, that's a great question. So pretty much any of the paths or mesos or any of these, they have some set of inputs that the customer is expected to fill out. And so we want to have those configurations, the things that are customer specific you want to expose, but you want to have as few settings there as possible, obviously, to keep that simple. The rest you try to bury underneath, detect the situation and make the best decision based on what you detect when you get to the environment. That makes sense? With regard to getting rid of NFS in Artifactory, once you move to a more containerized stance, what did you end up doing for storage for larger repositories and whatnot? Excellent question. So the storage solution then is object storage, basically anything that follows the S3 API or the Google Cloud API or the Azure, the sort of cloud native APIs, but the S3 API in particular is fairly widely followed with on premises. You can get several implementations in mesos of several storage solutions, for example, that do that. Most container platforms have some sort of solution that'll give me an S3 gateway. Other questions? When you guys are building your microservices, how did you, one of the things you run into is that kind of conflicts with the don't repeat yourself principle in your code, where you don't want to use the same, how do you deal with those common things that you do in every microservice? Do you have like a utility library or do you just make each developer just implement the same requirements? So with the common things that we do, a lot of that comes down to building a base Docker container that has those common elements built in. So this is where kind of the layering of Docker containers can be really awesome. So if there are common elements, then I'll build some sort of very thin application layer that implements those common elements, and then I'll share that container so that I don't have to reimplement the code. That's the ideal scenario. Like everybody in the world, for as long as we've been talking about common elements, cut and paste still happens, of course, but we've gotten a lot better about it, I think, over time. But yeah, that's basically how we do it. Okay. Could you please maybe disclose how the biggest issues you met when containerizing your application? So what's the best, what's the worst cases you met during this road? Yeah, no, I understand what you're saying. I mean, in some ways I shared some of them. So for containerizing the monolith as a single standalone one container service, that was pretty easy. I mean, yeah, no, this is, yeah, so as I said, for a standalone service, there weren't a lot of issues. When we got into the highly available service, where now the containers need to communicate with each other at some level, a lot of it did really come down to while the system was robust against fairly standard network failures, I mean, that's been built in from much earlier when we'd done kind of the original hand-built, highly available versions of Artifactory, we still discovered that a lot of those kind of assumed static infrastructure. I had to be able to put the IP address in of all of the different nodes in my cluster and know that ahead of time. And as you go to container orchestration, as you say, that can get a little bit more complicated. As the orchestration tools get more sophisticated, interestingly enough, that's getting back easier again, because the orchestration tools are getting good enough to the point where I can actually start making some assumptions. I can't quite go as far as assuming static IPs, but I can at least make some pretty basic assumptions about how the networks are structured and how they, you know, and how they relate to each other, and that's going to pass in from the network layer of the orchestration system, and there's well understood mechanisms to access and to pull that network layer. So in reality, that aspect of things, I mean, setting aside those places where we had stuff hard-coded and we had to learn, okay, you know, I've got to pass in this variable or on startup, I've got to go ask, you know, go ask this thing what I should put here. That part was actually the network layer was in some ways the easy part. This is what the orchestration system handles for you, as it were, is to try to make that as easy as possible. But, you know, there were, you know, it's always about where are the places where I assume that I understand my infrastructure, and in an orchestration system, even in an orchestration system, as much as they try to make it so that the infrastructure is clearly understandable, the level of understanding is not the same as a static infrastructure. And those are the places where we always run into trouble is, you know, I assume that I understand how these two things are going to connect to each other. I may not know what that is ahead of time, but I assume whoever is installing it understands it and knows how to make point A and point B talk to each other. And so those are the things that really get you pretty much every time on premises, is that suddenly as you move to a past platform, right, you're at a situation where you're hiding from the consumer the ability to alter those things if they're doing something weird that you didn't expect. And so you actually have to study that past service and say, what are all the weird things somebody can do and how do I account for them? Yes. I think I have time for one more question. Oh, no, I'm done. Okay. Thank you very much.