 Yes, it's working. Excellent. I'm Fred Dallerupel, I work for Enterprise DB, product manager for our Postgres Plus Cloud Database. We're not really going to talk too much about the cloud database today, but more the challenges of getting it to be running on OpenStack. This is Dave Page. He gave a presentation last year when we were setting out to start porting our cloud database to OpenStack and gave a very thorough run-through of OpenStack's capabilities and all the features and how we were going to approach the port to it. This presentation is a follow-up to that. We're just finishing up that, about to go into a beta period and want to tell you some of the things that we encountered along the way, how we addressed them. You may not be in the same situation, but you may, OpenStack is a very active project these days. A lot of interest, more and more deployments, and so if you're going through moving from some other cloud provider into that context, you may hear some things that are relevant here. And Dave is going to help me out anytime, jump in anytime you want to add anything to what I say. If you have any questions, happy to take questions anytime you have them. If I can answer, or Dave can answer better, I'll have him do it, okay? You don't have to stand here the whole time. You can feel free to have a seat, it's up to you. There you go. So Dave, his credentials and his participation, if you don't know him already, you probably do from the community, but there's information about him. What I'm going to go through here is just a little bit of a reprise of what was discussed last year for OpenStack. For those of you who aren't familiar with OpenStack and what's in there, give you an overview and understand what the challenge and the structure of this is. Then we'll talk just very, very briefly about the cloud database architecture. That is what we are reporting over, so understanding how that's different from other Postgres deployments would be important for understanding the rest of the presentation. Then we'll talk about how we approach the migration. We currently run Postgres Plus Cloud Database on Amazon. We're going to stay on Amazon, but we want to have the same code base support our OpenStack deployment, and there were some thoughtful approaches to making sure that our one code base manages both of those equally well and then also sets the stage for other deployments down the line in other cloud provider contexts. Some of the specific things we'll talk about are identity management, governance, different model, very different model on Amazon than from OpenStack. Customer managed administration is a topic on Amazon. We are the ones who administer how people deploy our cloud database there. When we move the product to OpenStack and it runs in a private context, we won't have that access. We're going to enable the customer to take on roles that we've done before, and there are a number of interesting issues that come up in that context. And finally, Trove often comes up as a question for us when we discuss OpenStack and our cloud database, and just want to serve, give the current thinking about it and in case head off any questions people might have. So, OpenStack, it's an open source project useful for creating public and private clouds. There are some private clouds that are out there already. We'll chat more about that. In some ways, it's similar to Amazon Web Services. Amazon coming first has been, and being the biggest, has set a set of patterns that other cloud providers tend to follow. So models, for example, instant sizes and with different machine types, how you address storage, how you address some other characteristics are kind of consistent. We are looking though primarily at a private deployment of OpenStack, and that's what this is going to be about more, if it was just going to something like Rackspace or the HP cloud, it'd be a different presentation. It's a little bit more challenging moving from the public to the private context. OpenStack is built up out of, as so many things, many different projects, and there are modules of capabilities. We'll do a quick run through of them in just a moment so you can understand, particularly if you know Amazon, what the, anyone know how to get rid of that? Okay, thank you. So I'll just sort of give a comparison. If you know Amazon, it'll help you understand what the corresponding capabilities in OpenStack are, and we are very interested, we're EnterpriseDB, not surprisingly, we focus on enterprise customers, and so Amazon is not necessarily suited or targeted specifically for enterprise customers, but when we get into the private context, it's a system that's going to be deployed in people's data centers. Clearly that's going to be an enterprise challenge and there are some features that they will expect in that context that we've never had to address on Amazon, so we'll go through some of those things. And we're not constraining ourselves to Amazon, in other words, and that's the challenge here. Just a quick overview, and this is almost, you know, this is kind of obvious, I guess, but, oh, I'm sorry, I got ahead of myself here. So this is the comparison of the, there's my pointer, I even bought a pointer just for this today, here we go. So on Amazon, you can store the virtual machines, in other words, people create images. On Amazon, they call them AMIs, and in OpenStack, the module that handles that is called Glance, on Amazon it's just the platform, there's block storage, on Amazon it's EBS, and in OpenStack it's called Cinder. Compute functionality, Amazon EC2, OpenStack Nova. Networking support, part of the AWS platform, and in OpenStack, Neutron. Object storage, S3, this is the more legacy, highly, even more highly redundant storage. It's part of S3 on Amazon and Swift in OpenStack. And finally then this Trove that I mentioned before, if you're not familiar with it, it's an abstraction for databases. The idea being that an application would not connect directly to Postgres, but to Trove, and then there would be that level of abstraction could have different databases behind it. Yes, this is why I was getting to the use cases that are the common basic kind of things, and of course there's a lot more detailed ones, but just to give you a general sense of what people are gonna be doing here. And notice that there's two classes here, and they're both OpenStack things, but the first one is an OpenStack administrator. In other words, this doesn't really have anything to do with our database. The second part are things that have to do with, oh, I'm getting it wrong, sorry. So the OpenStack administrator is the person, and they do have two roles. One is administering the OpenStack system as well as administering the PPCD system, and then the users themselves have these use cases. The OpenStack administrator, outside of the context of PPCD creates user accounts. They also create tenants, which are groupings of users, and we'll go into that in more details as a fundamental difference from Amazon. And then they're the ones who create database images for the customers who are gonna be using it on OpenStack. The OpenStack users themselves, of course, they're building applications that are gonna be using the database. They, in a self-service mode, can select and deploy a database from the images that the administrator has created for them and offered to them. And then finally, they attach their application to the database. So just a quick overview of the architecture of our product to give you a sense of the deeper challenges. Most of the products, if you go to Amazon Marketplace, are single AMIs, single machine instances that get deployed into a single machine, virtual machine, in Amazon. And in fact, their whole marketplace is structured around that idea. We don't do it exactly that way. We deploy, instead of a single instance of the database, a cluster of databases, different instances for each of the instances of the database. One acts as the master, and the others, and you can have any number of additional replicas, are read-only members of the cluster. There is a load balancer, PG pool, which sits on top of it, that the client applications connect to, because we want read-scale-out operations. The load balancer knows how to take the incoming SQL requests and shunt the read requests to the replicas through an algorithm, I believe it's round Robin, and then shunt the write request directly to the master. So the default kind of installation we'll see from deployments of our cloud database is one master and at least one replica. So two different instances co-operating. We do have functionality in the system where if you exceed a certain number of connections to the database, we automatically spin up another replica so we can distribute the incoming read requests across all of those so we have an automated scale-out capability there. So with all that in mind, when we sat down to change the code to start allowing our database to be run on OpenStack as well as Amazon, what we started with was a product that was originally created about three years ago or so that was specifically built to work on Amazon. That's going a little bit far, in fact, to talk a little bit. We did know that we were going to be in other contexts so there is some abstraction there, but in general, if you're working to create a product and you know that's your target, you're going to do some things that are specifically suited to that product because you haven't had that experience of looking at other things. So this little pictogram here is meant to imply that. You have some amount of code that is common across the functionality of the system and then there's some AWS-specific code in there. So how do you approach this? How do you add functionality for OpenStack while still maintaining the AWS things? So you could have done a couple of things. We could have had the common code remain the same and then have specific verticals for each of the target platforms. For example, AWS, OpenStack, and then the other ones that someday we'll be looking at. You don't get any leverage out of a single implementation that works across things. You get some duplication of effort. That wasn't a very good approach for us. We could have gone and provide an abstraction layer kind of like J-Clouds, which we do in fact use that would completely hide the differences between the different platforms. That wasn't going to be an effective solution either because there are differences between the platforms that sometimes we expose to the user and we want some guidance on how to set things up properly in each different environment. So that wasn't exactly the right kind of approach either. What we did was to have kind of a hybrid, if you will, the, we defined a set or identified a set of primitive capabilities that are common across different cloud providers. And we'll go through some examples of this. So just below here, you'll understand what I'm getting at. And instead of having code that was just monolithically separate for AWS and OpenStack, we provide a set of primitives, capability primitives that allow the code to find out is this capability available in the context we're currently running in? That's a predicate. It's a test that you would use in your code. And then when you want to actually take some action, you'd have some code that responded to that particular thing you're trying to achieve within the context of the provider. And again, I'll show you some code that shows how this is done. This works well because there is a significant overlap between the cloud providers. My comment earlier about Amazon having kind of set a standard way of doing a number of these things, you find a lot of commonality across the different providers. So in fact, most of the capabilities are common across platforms and there are only a couple that are different. What you find is that sometimes there's a little bit of a different model behind them or there's perhaps a different name behind them and you need to do some control over those different kinds of capabilities. And having these primitives and the ability to find out what's available in your context let you address that in the OpenStack code or in the higher level common code, rather. So for example, elastic IP addresses. And again, I'm gonna explain some of these things in the background, some of you are probably familiar, but one of the capabilities we use on Amazon, particularly for example, during a failover operation, you don't wanna have the application have to point to a different IP address to continue its operation. Amazon provides a way of what they call an elastic IP address which you can reassign to some actual IP address so that the IP address available to the application doesn't change, but it points to the latest operational database. So that's always true. That's true in OpenStack as well as Amazon. HiOps, which are dedicated performance for IOs per second of course. Take a step back. So on Amazon, if you use just their standard storage and you don't use IOPS, you get enormous variability in the response time for your IO operations. It can vary by, I probably shouldn't quote a number, but it's very large. And in fact, we don't encourage customers to be using non IOPS instances unless they're not in production and they're not doing testing or whatever. You wanna have a very consistent set of performance. That's a characteristic that is specifically called IOPS on Amazon. Now, and Dave can probably correct me if I'm wrong here. In the OpenStack context, there are optimized instances, I believe, but they're just called something different. So in OpenStack, you can set up different machine flavors where a particular flavor may also have different capabilities. So you might set up a flavor that's on SSD drives that are on local storage. But that doesn't really apply to the EBS volumes so much. The actual volumes that we're using for the main part of the storage in PPCD are in Cinder in OpenStack. And on Cinder, you can have different regions as well as different volume types. They're not specifically IOPS where like on Amazon where you can specify I want some predetermined number of guaranteed IOPS for a particular volume. You just specify I want a particular volume type. And then behind the scenes, the storage server that's used for those particular volumes might be set up to use very high-speed SSDs, for example, instead of the magnetic drives. But there's, as I mentioned in, there's a similar aspect to this on the machines themselves because they can have local storage too. Okay, the last one, and I mentioned tenants before, is a concept that's on OpenStack and not on Amazon at all. And we'll go into that in a little more depth in a moment. So there's very few times where our code needs to understand which platform we're on in a general sense. The only time that happens is generally when the system is starting up, bootstrapping operations and so on. But once it's up and running, we have a piece of code, a provider interface that holds all of the provider-specific information and after finding out at the beginning which provider you're on, setting things up, then you move over to the provider interface and make lower-level requests. So an example of using the predicates and action capabilities, and I mentioned before, you could say in this provider instance, which has been set up to be the provider we're on, you can ask whether it supports initial database profiles. And if so, then you go and export that information, configuration information. And the actual code that is in the provider interface that is executed by this statement goes and looks at each of the, or tests which of the machine types are being requested by the customer and it sets up the proper effective use of RAM for that instance according to that machine type. And because that's different on Amazon than on OpenStack because although they have similar kinds of machines and so on, we can optimize that for each environment individually without having to change the code just extending the provider interface to do that. Another quick example here, and this is partly, so I mentioned earlier that sometimes the differences between the systems are not necessarily functionality but language. Amazon refers to some things differently than OpenStack does. And so we use that as well here. So for example, if the provider supports primary failover to master and also supports failover to replica, then we have an option group that we're gonna offer to the user in the user interface that we can pull and that should probably be, maybe that should be provider option group, anyway. And we set that up and then use that in the user interface. So we can handle not just different functionality but also representing the proper language to the user for whichever environment we're in. Let's talk about users and tenants for a moment here. On Amazon, you need to have an AWS account, Amazon Web Services account. You'll also have an account with our cloud database. But you first have your AWS account and then that becomes your user identity in Amazon and that gets linked to your account in our cloud database. And you can have multiple people within an account on Amazon but that's kind of the extent of how far the model goes. On OpenStack, there's a concept of a tenant which is a group of an aggregate of things, composition of things. Specifically, they would have a specific virtual network. They would have some number of volumes associated and available to them. Images, again, the machine images that the administrator has created for people. And then keys and also users. And the idea is that you want to have all of these things available to a group of people under a tenant because they all work together. It's these people are doing this kind of work. These are the things they need. These are the images that were created by the administrator explicitly for those people presumably to beat their business needs. By the way, they're not currently hierarchical. There's some work going on to make that so. This point is relevant in a moment here. The way we're modeling this, and I think this is pretty standard now although I haven't surveyed all of the OpenStack implementations or deployments. We model tenants as organizations or teams within a company. And as I mentioned a moment ago, we want to include inside a tenant people who work with the same computing tools. They're billed jointly. So any member of that team can then use the same things and they get billed as an aggregate. Good examples of this are development teams. If you're going to be doing dev test in the cloud, maybe there are specific set of pieces of software you want to be using. One version of the database has been agreed upon for that development use, some other supporting resources that's all created and available within a specific tenant. You could have a different tenant, also a development group who has a different version of the database, a different set of services. And they know each of them are going to get the right things that have been designated for their use and are different from one another. And they don't have to worry about it. They're limited to what they see. So they can't make a mistake in what they choose and how they use it in that context. Mentioned development teams, but it could be a marketing department, it could be any part of the organization. So modeling the organizations within the company as tenants is very useful for the pragmatic structure of what you deploy to them and how they work with it. It also helps you with some other things we'll get to in a moment. Okay, so the guy. And that's billing. One of the enterprise expectations is that different organizations are going to be responsible for their own budgets, of course, and what they spend. And IT may well be providing services for them. And you need to have a way of billing to the right organization and make sure that it comes from their budget rather than someone else's budget. It's not all IT paying for everything. So having tenants model those organizations, modeling how businesses separate their billing structure makes it very easy for this to fit into their context. One of the things we've had to address is that on Amazon, we don't do any billing. The marketplace products all work by, you register on Amazon, you put your product there, you describe some pricing that you want to provide to customers. Amazon then watches how long your instances are running. They're gonna meter that on an hourly basis. And then they do all the billing to customers to do that on a monthly basis. And so our code is never, we have kept records of when instances are created, when they're destroyed, how long people are doing things, which person has done them and so on. But we never had to do billing from that. When this goes into an enterprise, into a private cloud, that's a requirement from the enterprise to be able to handle that build-back kind of operation. So we're not gonna provide billing package ourselves. What we wanna do is enable that. So using Cilometer, am I pronouncing that correct? I don't know what to do. It's one of those things I read and never quite sure how it's pronounced. Is a standard tool in the open stack context where we're holding on to those kind of metrics. We intend to provide an interface to that, the standardized billing kind of interface so that you could select your own billing system and plug it into the backend and then have the system provide the information to the billing system and then follow through on everything. I mentioned earlier that on Amazon, we were the administrator of the system. We support, did I even mention? So when you deploy your cluster of databases, we do that from our PPCD console. We run one in each of the Amazon regions and we run them and then we deploy instances on behalf of the customers. So all the instances are the customers but we do have a multi-tenant console for managing that. One other point to make on that is that well, that's probably not relevant but we do deploy all of those on the behalf of the customer that'll be true in open stack as well and they're all private instances, not owned by us. So a couple of points here. Yes, we're the administrator on Amazon. On open stack, it's in a private context so obviously we're not gonna be running that console. The console will be there but that'll be run by the customer. So we have some challenges here because we're very used to having an Amazon environment where each region, Amazon region has the same set of services, the same set of operations, maybe the price is different between different regions because of different parts of the world but it's a very consistent kind of situation. So in that sense, we think of open stack deployments as regions in their own right. They're different islands, if you will, of private cloud where our product will exist and they're not as uniform as in Amazon because when you deploy an open stack private cloud, you go through a set of choices of configuration, deciding which services to use and a whole set of different kind of capabilities and characteristics that are all part of what you can do with open stack. It's a very flexible environment. Provides a challenge to us because we don't understand ahead of time exactly what our product, the context our product will be running in, all the services available and so on. So that's been a challenge and mention how we're gonna address that a little bit more in a moment. It goes further though. When on Amazon, customers create their own AWS accounts, that's a requirement that's something we're not involved in but on open stack, the administrator creates those accounts and constructs the tenants. So this is a new set of capabilities and we need code to be able to accomplish this, something that doesn't exist on Amazon but that's one of the things we had to adjust and it's a very different model for tenants, we're gonna have additional functionality so it's just a big area that we've had to think through a number of issues on. And finally, another difference between them is when on Amazon, we're the ones who've created the database instances that we deploy, generally corresponding to different versions of Postgres that we support, EnterpriseDB also has not only support for community Postgres but we have our own proprietary events server which has additional characteristics, for example, database compatibility for Oracle and so we provide instances of machines, AMIs, virtual machines containing those different products and different versions of things. What we expect is that in the open stack context, first of all, we're not gonna be creating those instances, this will be something that the customers themselves will be creating for their customers within the organization and yes, they'll probably be choosing databases and versions and so on but they may be doing other things that are very custom, there may be other services they wanna provide in those instances and also, of course, open stack they're gonna be doing many more things than just database so those administrators will have to be thinking about applications, services provided to people in the organization and consistency across them. So it's a large task, it's a significant task for us just for the latest version of Postgres and maintaining all of that it's gonna be more challenging for administrators in the open stack context and we wanna help them out and try and understand what it takes to do that. So I think I went through all of this just mentioning here, want to comment that I'm not sure if I said drivers but there are a variety of things that can be different in different regions in different open stack deployments. Because there's so many unknowns about this one of the ways we're gonna address this is having a beta program that's gonna be upcoming for our open stack product where we get experience with customers, potential customers, people are interested in participating in the beta who have different open stack deployments, different configurations and we'll get some real life experience on the variances between these systems and make sure that our product is robust in all of these contexts. Just as a comment about that on Amazon one of the biggest challenges for us has always been hardening the system making it robust in the face of variances in how Amazon processes things and it could be outages in the system, it could be temporary restrictions on being able to deploy instances or availability zones which might have been filled up and not available. There's all kinds of things that can happen well beyond what you might have ever run into just deploying a singleton instance on a physical machine or a virtual machine in the data center. So we'll use the beta program to get experience with that. We're still looking for people who are interested in participating in it. We also wanna use that to understand just what kind of expectations they have on additional enterprise features. I mentioned billing, I'm sure there are others can speculate on that but we wanna get some first-hand experience with customers who have real requirements and can drive that continuing expansion of our enterprise support. One other thing that we're doing this time around reminds me of another one here, just to... So the way we create our instances on Amazon today is that we take a virtual machine and we install Postgres on it and we configure it and we set it up and then we capture that as an AMI that Amazon then propagates out to all of the regions that we support. That's been, it performs well when you wanna deploy those things but it's been very challenging in terms of flexibility of updating software. When Heartbleed came along we needed to spin up an entire AMI to support it. We can do it, we can do it quickly but it's a heavyweight process. What we're moving to is an RPM-based approach where the virtual machines that are created by the administrator in either context as we will also have the work we're doing on OpenStack be available on Amazon as appropriate when we get there. It's a much simpler process now. You basically create a machine that has some of the basics, understands maybe some configuration and so on but when you deploy it instead of having the software already existing on the AMI you would use, the system would use yum updates for example on a CentOS instance to have the software installed from there and then upgrades could be simply another RPM update following that and the Heartbleed could have been solved by an update of the OpenSSL RPM. So this is gonna be a great simplification give us a lot more flexibility make it easier on the administrators. And we're in the last 10 minutes here just to quickly talk about Trove a little bit. As I say, we often get asked about it. Trove originally was intended to or built around MySQL and supported the capabilities of MySQL. At the beginning it didn't understand clusters at all. Remember it's an API that is intended to drive basic cloud provisioning functionality for a database. So with our cloud clustering capabilities not having that in Trove would be a significant downside. Now Trove has evolved to include some of the clustering stuff but it still has a ways to go. So when we get asked whether we can use Trove we wanna make sure that all of the capabilities of what we do are represented in Trove because we don't wanna have important functionality not available to people. So we could and we go back and forth in these discussions and there are some alternatives here and if you have any feedback on this I'd love to hear it. We could unilaterally extend Trove to include the functionality that we need people to be able to control or as part of our OpenStack port we've also created a restful API to our system so that you can have programmatic control over all of the operations that you would wanna perform with your cloud database. You can normally do it through the GUI but with the restful API you don't have to use the GUI you can use the programmatic interface that'll allow us a lot of flexibility and integrating into enterprise management solutions dev ops all kinds of things. So you could use, we could tell people don't use Trove use our management API but that kind of removes the possibility of using a standard API so you don't have to change someday. Anyway, so those are the back and forths on this at some point I presume Trove will be much more capable as I understand it there is an extension ability in it so that we could make some of our stuff available some of our functionality available in context where Trove didn't support that in a standard way but again I'd love to hear any feedback on that. And I think that's it for the presentation and we'll leave a little time for questions. Support for all of the distributions of OpenStack like let's say I'm running OpenStack on a very happy link so I'll say this you know if you want to go out and help. So we chose to port first to community OpenStack because we thought that was a more neutral platform and we are part of the beta program is gonna be to determine which distributions we support first basically. In time we want to get as many as we can and we're probably leaning toward either Red Hat or Ubuntu or Morantis but we haven't finalized the order of those and love to get feedback on that as well. That's the first release and then we'll follow it as quickly as possible by at least the first distribution if not more than one. So if I can just take a step back by starting with community OpenStack we expect Red Hat for example just to work because Red Hat don't tend to change things from the community very much. We then expect probably Ubuntu will work with a little work and Morantis similarly. The idea is that we start on the neutral ground work our way out from there and as part of the testing we wanna see what does and what doesn't work. One of the big problems for us is if you're familiar with OpenStack but there's a project called DevStack which allows you to run basically entire OpenStack system on one machine. That gets a little bit difficult when you're running a Glassfish server to run the PPCD console and multiple database servers at once it's kinda hard to do that in DevStack. So one of the big challenges for us as Fred mentioned earlier is that we've got all these possible different environments to run in different distributions of OpenStack, different configurations. So for example, one customer might be using Cisco Nexus switches talking directly to Neutron whereas we're using OpenV switch on our systems. It's very, very hard for us to build enough OpenStack installations to do all that testing ourselves. And that's gonna be a problem I think for anyone potentially who's writing an application that's gotta integrate with a version of OpenStack. We could spend millions of dollars putting together all the different variations of sands that could be used for storage and things like that. So this is very much why Fred was talking about one of the challenges for us is having this beta program where we can start to iron out not just what will and won't work in different OpenStack configurations but also across the different distributions. So OpenStack has zones as well. They don't mean a huge amount in OpenStack. They're really just a grouping of compute nodes. What we do in PPCD is round robin all the available AZs when we're deploying new nodes. So we try and spread everything across all the zones that are available in your particular installation. Now you can restrict what it can and can't use by the way you group compute nodes together in different host aggregates and what you make available to different tenants and so on. But all we do very simply is just try and spread the reed slaves across as many different zones as possible. So if you have a database on the same network, somebody take over of it, says it's on the same network, they're gonna take over of it. Well that's very much down to how you configure OpenStack in your environment. In our test lab, we've got two different racks for zone one and zone two for testing. And then we have a number of different types that you can use for testing. And then we have a number of different types for zone one and zone two for testing. And they are on the same network, but they don't need to be. But as I say, that's very much just, that's an OpenStack configuration thing. And it's down to how you design your network topology and how you separate the nodes in the different zones from one another. Any other questions? Well, thanks everyone. Really appreciate your attendance.