 Okay, so welcome everybody, we're going to go ahead and get started. So we're here to tell you about OpenStack at MyMedia, which is a scalable consumer media company. So what we want to do today is first we're going to introduce ourselves, Michael Yoon, with me is Peter Chen, our director of DevOps and Charlie, Charlie Corona, our systems architect. And we want to talk to you, basically, share our story. So the story is basically how we started to use OpenStack and why it's applicable at our company. And we want to be able to share basically some of the lessons that we've learned and hope that are applicable in your own use cases. So let's start with a prologue. Who is MyMedia? You know, what do we do? So MyMedia helps you rediscover your digital memories. We're a startup company that's based in Brooklyn and we currently support about 100,000 users and we recently signed some fairly significant global partnerships, so we're on track for tens of millions of users in the next 12 months, which is driving a need for about an exabyte of storage in the next 12 months. So this is a pretty exciting kind of a story for us. And what we do is we basically help people rediscover the digital memories, which sounds simple, right? So this could be something as simple as just storing and retrieving your memories. But if we dig a little bit deeper, it becomes a pretty complex problem, which is basically mimicking how humans think and how we basically remember things. So MyMedia provides services not just storing and retrieval, but around things like object recognition, transcoding of videos, optimization of things like thumbnails, and then also about the relationships between metadata, allowing people to do things like organize their data, to be able to tell stories around their memories, to be able to share their memories, things of this nature. So that kind of sets up what it is that our product is, and what we needed to do was we needed to choose a technology that fit our business needs. So our primary business needs, as you can see, are ultimately, you know, the first thing is about low-cost object storage. So at the heart, we're a premium service, and we want to be able to build this on commodity hardware. So the price per gigabyte is one of the major factors for us, and the other is that the types of data that we're talking about is object-based and not block-based. So low-cost object storage is one of the first requirements. The second is that we have a lot of variable workloads. So as I was mentioning, there's a lot of things that we do, not just storing or retrieving the data, which includes things like, you know, video transcoding or facial recognition, and these types of different tasks have got a wide degree of variation in terms of both the latency and the nature of our workload is quite bursty, right? So people will sometimes upload tons and tons and tons of stuff from the desktops, and other times it'll be, you know, sort of one photo a week as they capture things, you know, as they take them. A third characteristic is that we need to operate at consumer scale, right? So consumer web scale, and we have very rapid and bursty growth. So what I was just kind of mentioning is that we just recently signed some large partners, and we have an organic growth strategy. And what that means is that, you know, our CEO comes to us one day and he basically says, okay, we signed another deal, and now we need to be able to expect, you know, another 10 million users over the next couple of months. So we need to be able to respond with a very, very scalable architecture that can handle this kind of rapid, but also bursty growth. And then finally, because it's all about, you know, sort of total cost of ownership, we also need to be able to take advantage of not only agile software development, but also agile infrastructure development. So we want to be able to take advantage of the rapid adoption of innovation in the space of, in this case, infrastructure. So based on these four primary business needs, we chose OpenStack and SwiftStack. And now Peter's going to tell you a little bit about the early days of my media and OpenStack. Hello everybody, just a show of hands. Has anybody deployed OpenStack already in their environment? Anybody plan on doing OpenStack deployment in the next 12 months? Okay, fun. So we have a lot of people who has gone through the wonderful things about how do we implement OpenStack correctly. So at the very early days, you know, hardware, which components networking and upgrading is always an issue with OpenStack. So our first reference to OpenStack, as you can see, is our first iteration of deploying OpenStack. It's a very, very high-tech environment. You know, not too many people that I know have done this before. So as you can see, we have our OpenStack environment, which does our compute, our neutron. And we have our USP drive that does our Swift. So this also proves that you can deploy OpenStack on almost any environment, any hardware. So we have pretty much proven that. So I hope your environment will be a little bit more robust in going forward. So we started out pretty early in the OpenStack days. So let me get another show of hands. Who really likes setting up neutron? Nobody? All right. So part of the stuff, our network issue, has always been how do you set up correctly? And how do you keep it running smoothly? So at the very beginning, GRE tunnel is always a heavy, heavy beast. There are reports out there that are 30% slower than your normal VLANs and VXLANs. So we went through a few iterations of how to set up neutron properly, and how we can get around all the latencies. So the other thing is documentation. I guess everybody has Google quite a bit while they're setting up their own OpenStack, and you can probably see there's about a million people with a million different ideas on how you should set things up. So that's always the interesting part. There is a lot of documentation, but it is always very specific to that person's point of view when they're trying to deploy it. So that's always a fun thing to do, give you a lot of time to look at Google's logo and see what's happening on the day. So the next thing is hardware. Exactly how do you set up your OpenStack hardware? So you have a million reference architecture that goes from one single node to about a forest of a hundred servers. So how many neutron should you put in? How many compute should you put in? Those are all the fun stuff and how what kind of swift environment you should set up, how many proxies, how many nodes, those are all the fun stuff that we went through during our growth period and high availability on neutron back then was always fun. It's very hard. You have one controller that's handling everything and with all the GRE protocol that always makes things a little slower. And one failure will cause you to wonder what exactly happened. In the old day, the hardware will go into kernel panics a few times. And since it's neutron, your entire environment goes down. So that's a fun part of things. Then the next is performance tuning. Going back again, whose document do you read? Who is the, I guess the de facto person to talk about performance tuning? So those are all the things that's trial and error. You have to figure out exactly what your needs are and how many times you can fail to figure out exactly what the right thing for you to do should be. So since all the failures have been, the failures have been laid out. Michael here, Charlie here will tell you our current infrastructure. Okay, so just before we get into what our current production looks like, I want to tell you a little bit about chapter two, which is how we customized our architecture for our application and vice versa. So what you see in front of you is a brief high level summary of what our application does broken down into logical components. And so I'd like to walk you through two examples of how we configured our application or how we needed to configure application for architecture and vice versa. So the first example is a scenario where a user would be uploading a large video. And so if they're doing this from their phone, it might be over Wi-Fi, it might be over cellular, and we want to be able to do this partially so that they don't lose content if they go in and out of Wi-Fi. So this is what we call partial uploads or segmented uploads. And basically the point is here, we originally designed this where the partial uploads were stored in local storage. But we realized that that was not very stateless. And so it didn't fit our scalability requirements of our application servers. So we then explored other kind of solutions like memcache and or redis. And here's an example where we finally decided, you know what, we're using Swift. Swift has got capabilities for things like large manifest files or large objects using manifest files. And so what we've ended up using is that capability of Swift to support our partial uploads, where we have specific containers to store the metadata for the segments with specific storage policies, including things like expiration, time to live of the partial data. And as well as less redundancy, right? So we don't need to store those with quite the same redundancy as the original files. A second use case is an example of how we support video streaming. So video streaming is something that is a very large important part of our product. And so we transcode files when they get uploaded. And we prepare chunk lists. We prepare a couple of different optimized copies of it. And then in order to be able to efficiently implement video streaming, we need to be able to have the byte range requests from our underlying object storage layer. So this is another example where it was important for us to ensure that the underlying object storage technology, in this case Swift, supported byte range requests in order to be able to avoid moving more than is necessary back and forth in our sort of east and west networking. So in general, underneath a chapter of customizing the application for your architecture and vice versa, there are three kind of primary lessons or kind of things that we needed to focus on here. One is designed for Swift, right? So the main characteristic about Swift, or one of the main characteristics, is that it's an object store with eventual consistency, right? So it's important for our application to not treat it like it's a relational database that will instantly, you know, you can always, you know, get back exactly what you stored. But rather that sometimes the data you write there may not come back. And you have to write your application to retry, these sorts of things. It's also important for us to design for networking, right? So in our particular case, we have a decent amount of moving things back and forth from our storage layer. And so it's important to be able to architect our network to be able to support as much east-west traffic and proportionate and reduce the amount of north-south traffic out to the internet. Thirdly, and this is just a general design principle, we want to be able to design for virtual compute, to be able to satisfy our need for variable workloads. Which means that we need it to be as stateless as possible, right? So we want it to be able to make sure that each one of our application servers, the uploading servers, the transcoding servers, the video servers, the API servers, are all as stateless as possible. And so let me give you a couple of quick examples of how we designed specifically for Swift, right? So I'm just taking that top layer, the first bullet, and here's some examples. So when we talk about a user, and we're talking about an end customer here, what we needed to do was decide whether this user would be represented in our infrastructure, or whether it would be represented in our application, or whether it would be represented at our Swift layer. So there are accounts, and there are accounts, right? So Keystone has a concept of accounts, and OpenStack has got concepts of accounts, and Swift has got concepts of accounts. And so we had to make a decision as to whether a user account would be a Keystone account or a Swift account. And our actual answer is that it's a bit of a hybrid, right? So in our particular implementation, we've got a single account for Keystone. But we have a one-to-one correlation between a Swift account and our end user. And then we use the special sort of reseller account inside Swift to perform authentication, so that our application is the one that ultimately does all the authentication. A second example is Swift access. So in one optimization that we attempted, or that we thought about, was to have our customers directly access the Swift object layer. Ultimately, we decided against it because it was mixing our business logic in both the infrastructure layer and the application layer. So when people decide to share things or when people decide to restrict people from viewing things, we could have had people access the Swift object layer directly, but we ultimately decided that it was mixing and matching where we'd be hosting our business logic. You could either be in middleware in Swift or it could be in application layer and that was getting pretty confusing. Another example is a choice of containers and how we distribute containers. These containers are not the same thing as the Docker containers or the application containers that is another hot topic in OpenStack. But really these are the containers specific within Swift, which are basically buckets and or folders, sort of the S3 equivalent would be buckets. So containers and storage policies, we needed to choose how they map to our application functionality. And in our case, we have decided that there are certain containers and the properties around those, for example, storage policies, that allow us to be able to treat different types of storage differently. So we can treat video files, which are large, different from music files, which are small but we don't make multiple copies of them for different resolutions, versus thumbnails, versus original photos, things of this nature. Okay, so Charlie, we now talk about what our current production environment looks like. So we had this big problem. We had an existing OpenStack environment and it was built on top of a house of cards, right? So we have this wonderful application and we have this thing that kind of reboots every once in a while and have all sorts of issues with it. So in this house of cards, there's nothing like the TV show house of cards. It was just 52 cards and it would fall down every once in a while. So we had to make a decision. We had to say, people are not going to visit an application or use an application that's not stable, right? So we went on a little venture. We said, all right, we don't have a lot of OpenStack knowledge. The people that were in the organization at that point in time. So we said, what are we going to do? So we decided, after a little introspection that, well, we can go and try it and do it ourselves and mess it up and keep trying things. But we said, we really want people to see this application. So that's when we decided to start partnering up with people. Possibly with the eventual intent of taking it over ourselves once you've learned enough to handle the environment. So what we decided on was we decided to partner with Canonical for the OpenStack part and SwiftStack for the Swift part, right? Since Swift is such a big part of our application. And a lot of smart people that we're dealing with. So we're basing ourselves on Ubuntu 14.04. We're using Percona, extra DB for our MySQL back end. We have two racks right now, not a huge environment. But it's got a 1.5 petabyte worth of disk for our Swift nodes. We're using New Relic. We're using a whole bunch of different monitoring things. Because if you don't monitor your application, you don't know what's going on, you don't know what your customers are seeing, right? It's a big thing. And we also have a little bit of a mix. We're using some LXC containers here and some KVM there and in general OpenStack. So we decided what problem we were trying to solve, right? So the big thing was get this thing stable. And we had to kind of look back and say, well, what is it that we really want to do, right? What is it that we want from OpenStack? We looked at some other things, do we do it straight KVM? Do we do it VMware and also the kind of stuff? And we started by looking at it and saying, what really matters to us? So in realistic terms, Swift is really what matters to us. We could run the applications on KVM and other, but we decided to say, hey, let's use the whole OpenStack thing. Cuz it's pretty cool and it's up and coming, right? Who's in your environment? Who's in your organization that can take care of these things, right? OpenStack folks, there's not a lot of them out there. I mean, we're in New York and you search for people that know OpenStack and it's either that they want $50 million a year to take care of it or they don't have enough skill. They've opened the same manual that you did and it comes down to. And you want to know how hands on you want to be. Some people decide, hey, you guys go take care of everything. Call me when it's done and we'll make sure our OpenStack or applications are running on the stack, including Swift and whatnot. We decided that we want to be a little bit more hands on than most people, right? We want to actually start gaining knowledge so we can possibly take this over at some point in time. But again, we think about it, what's our priority? Our priority is to keep our application running, not necessarily OpenStack. And this goes back to how long we want support, right? So at first we decided, hey, for OpenStack, maybe we'll do six months, right? So this varies by distro. We decided let's try six months, but we can always extend it out a little bit more. And how much do we need help as far as being handed over? And that's something that we still need to discuss. Is it gonna be that, hey, just hand us the book and we'll go or, hey guys, you need to explain to us or maybe run us through one cycle of an upgrade, right? We deployed Juno, we want to go to Kilo, right? Show us how to do that because that's always a magical time as far as I understand in the OpenStack world. So choosing a distro, there's a bunch of good folks out there. And this is where it comes down to understanding what you need, what your platform needs, right? So we looked at Mirantis, we looked at Red Hat, we looked at Canonical and Ubuntu and Rackspace, right? Spoke to a great bunch of folks. They were all very helpful, they obviously wanted to sell us stuff. But we got a lot of information out of these folks. It was good, it really came down to, well, what does a distro give us? What do we want out of this, right? So we found that Canonical kind of fit us because they gave us more than just a distro, right? We were basing ourselves on Ubuntu, so we said, well, that's cool. So these guys know OpenStack and Ubuntu. And then we took a look at some of the other things that they were doing, such as Snappy, right? So Snappy kind of intrigued us. We said, that may be down in the future, that Adam Snappy core OS kind of thing. Maybe that's a cool way to do deployments, right? Not necessarily do, here's my war file, go deploy this kind of thing, right? And do they have any other connections, right? So we wound up finding out about a whole bunch of new people that we can connect to and get more information about and kind of network with them and say, hey, did you ever do this before, did you ever do that before? And that's one of our relationships with SwissStack. So that came about, right? So they've been great as far as helping us out. So how we did it, right? And again, this goes back to understanding what you want to do, right? So all you hear about an OpenStack is, you need tenants, you need VXLands, you need this, you need that. And we sat there and said to ourselves, I don't think we're gonna have a developer at this time that we want spinning up networks up and down, small organization. We wanted it very, very simple. We said, we don't care about tenants, if we have the one tenant or we have the one flat VLAN, that's perfectly fine with us, right? Somebody said to me, well, you only have 4,096 VLANs to go through, okay? It's gonna take me 20 years to do that, I'm gonna start making up VLANs just because, right? And then your network guys wanna kill you because you're kinda like, really, that many VLANs? So that's what we decided, right? And we don't wanna be cookie cutter. Again, sometimes, like an organization, any of the distros will say, here, we're gonna deploy it this way. And if you don't know enough about it, they're just gonna deploy it that way, and then you're gonna find out later, huh, I really would have liked that feature instead of this feature, right? So it's good to know exactly what you need for your deployment. And you don't want to deploy it as everybody else. And sometimes that causes a little pain because you wanna go back and forth and try to figure out what's needed, get on the same page. But hey, that happens in everyday relationships, right? So it's a lot of communication that you need to get through and do these things. But that's the most important thing is to understand that. Don't do cookie cutter and make sure you know what you wanna do. So the other thing is, don't pigeonhole yourself. Make sure that you're thinking out of the box that open stack, as it is, is sufficing for you. Make sure you do a lot of reading about it. You may not understand how to install it, but you won't really know more information about it, so you can make educated decisions. And make sure that everybody on your team knows what's going on. That's also important, cuz they may have some certain ideas that may help out in the long run, right? At this point in time, I just wanted to bring Michael in to talk more about storage, okay? So right now, our current production environment supports 100,000 users, and it's about 1.5 petabytes of storage. But as I was mentioning, our CEO loves coming in and telling us about another deal that he signed. So we really do have an interesting task in front of us, because we have signed a couple of partnerships at the global level. And I've brought a couple of toys, but some examples are here. So this is an example of an eye on camera. So it's similar to a GoPro camera. And their marketing department tells us that they'll be selling about 1.2 million of these in the next year. And My Media is the cloud provider for this. This is another example, right? So this is a phone produced by MicroMax. And so this is the largest phone manufacturer in India. And we also have a very interesting partnership with them, where the My Media application is the native gallery and our phone camera application on the phone. And back in February, I think, so they do a flash sale type of sales model. And they sell about 30,000 phones a week in a flash sale in less than 30 seconds. So we're talking about millions and millions and millions of phones that they plan on selling over the next 12 months. So these kinds of things lead us to the need for planning for exabyte capacity and planning to make sure that we have chosen the right architecture for both storage and compute layers. And the sorts of things that we talked about were things like making sure that your architecture is stateless and choosing underlying infrastructures like OpenStack and Swift that allow for this kind of scalability, this kind of linear scalability. We also have to think about things like global deployments. So this allows us to now get another reason for choosing something like an OpenStack technology is because it can provide a number of different strategies in terms of deployment capabilities. So right now we're in a co-located data center and we have our own hardware, right? So we've designed our hardware kit that's built out over a number of racks. And we've configured it in such a way that we know how to order next ones and then how to install them and how to basically grow out our capacity. But we also need to be able to extend that in terms of our partners, right? So we need to be able to understand who we can use in terms of an international field and what kind of models they provide. It could be something like a rack space, which is a cloud service. It could be something like a software, which provides a hybrid. But then also allows you to be basically rent out bare metal. Or it could be something like a physical data center where we then, same thing, purchase the equipment, build it, ship the racks, and then install and configure it ourselves. So these are all the sorts of things that a technology like OpenStack allows us to then have the freedom of choosing anyone or a hybrid of these types of models. Another thing that is important when planning for the future is this embracement of things that are open source. Because it allows us to be able to adopt rapid innovation. So things that we are very, very interested in that are coming around about as a result of these kinds of discussion, storage policies, right? So storage policies clearly are a large part. It allows us to be able to fine tune what kind of categorizations of storage, right? So some people might subscribe for very, very high availability. Some people might subscribe for a free model but with lower redundancy, these sorts of things. It also allows us to be able to choose or allows to be able to explore evaluations of new technologies. For example, a partnership that we're engaging with Seagate to explore kinetic drives and how those drives lower the total cost of ownership. We're also very keenly interested in things like SMR technology, right? So right now, we've got a 4U, 60 bay, 4 terabyte, but very, very homogeneous kind of storage. And it's interesting to think about, okay, well, how do we blend that in with a 5U, 96 bay, 10 terabyte, or maybe 8 terabyte SMR drive, right? And what does that do in terms of lowering our total cost of ownership? Erosia coding, I think everybody's eagerly, certainly everybody in the Swift world is eagerly awaiting the actual production availability of Erosia coding. Because this will materially impact, I'll be able to go to our CEO and very, one day it'll be, okay, here's our total cost per gig per month. And then the next day I'll be able to say, okay, and now it's only this. And those are the sorts of things that he will love to hear. Okay, so finally, just in kind of conclusion, the lessons that we've learned by applying these kinds of choices, these design choices and the choices between how we design our application as well as how we choose our infrastructure really all fundamentally come down to alignment, right? So one of the things that I would certainly stress for people that are exploring this is first understand your business, right? So you're gonna wanna start by understanding what your business is and what your priorities are because ultimately we need to be able to prioritize what are always insufficient resources, right? So unless you're, I don't think anybody's got too many resources and so prioritization is one of the most important things you can do. And just so just as a reminder, for us, for our particular business needs, we needed low cost object storage with a variable workload. We need to be able to grow at consumer scale with a very bursty kind of growth pattern and we wanted to be able to take advantage of rapid adoptions of innovations in order to build a lower total cost of ownership. On the application side, it was important to align with the infrastructure and design it in both ways, right? So we need to be able to design our networking and our infrastructure to our application and vice versa. And then finally, the thing that enables all of this, of course, is the people. So one of the things that's important when you're making these kinds of decisions is to ensure that everybody here, whether it's your business units, your infrastructure team or your application team, is aware of what everybody else is doing. That doesn't mean that you have one person that does everything, but it does mean that all these teams talk to each other, right? So it's really important that your business understand your infrastructure and understand your application and vice versa, right? So it's most effective when everybody really gets some understanding of what the choices and the impacts of those choices mean on each other. People is also where the desire and the perseverance is found, right? So we can talk about choosing a distribution. And I can certainly go to our CEO and say, okay, this choice is cheaper and this choice lowers the total cost of ownership. But at the end of the day, it really is about the perseverance of going through the documentation and finding out all those little gotchas and those tips and tricks. And that requires a certain amount of sort of character traits in all of us. And it's really embodied by things like attending open stack meetings, like going to meetups, interacting with the community. Those are really the only fundamental ways that a emerging technology like this, you get a true answer. It is not something that is just out of a book that you can just go and read. It really is something that requires an awful lot of research. And so ultimately, as part of this is also, of course, the partners that you choose. So here's an example of some of the partners that we are working with and they help us along the way. So we don't claim to know everything. We're in fact relatively new at open stack and these are the partners that help us to give us a boost, accelerate our learning, and then most importantly, transfer the knowledge so that we do become experts. Okay, thank you very much. So this is who we are and if anybody has any questions, now's the time. Yes, sure, yep, yep. So the choice between AWS, and so here's an interesting fact is my infrastructure team did a Herculean task and we actually deployed our entire application in AWS in the matter of maybe about a week or a couple of days for the express purpose of a pilot. So I mentioned we've got global partners and one of that is that it's really slow if you're trying to deploy it out of New York or if you're trying to have them use it out of New York. And so they actually deployed our entire application in AWS for the purpose of the pilot. So what's the reason why we chose open stack and Swift for sort of non-pilot scenarios? Primarily it's because of cost. So an AWS type scenario is very, very good for startups that are focused on compute in some way, shape, or form, or selling a service that is somehow related to compute, right? But in our case, fundamentally it's about storage. And you can't time share storage and nobody's gonna rent you storage cheaper than you can buy it. So because of our fundamental business model, it made sense for us to own the drives. And that's the only way that we're gonna have kind of the lowest cost for the particular service that we're delivering. Yes? We do certainly be happy to talk to you about it afterwards. Got some wonderful spreadsheets that, yeah, I mean, that was certainly one of the benefits that doing the pilot, not only on the business side where we'd be able to actually demonstrate the pilot, but then we could actually concretely look at the numbers and say, you know what, those correlated very closely to where our projections were. So it gave us an extra kind of, cuz of course, the business folks are asking us all the time, why are you doing this instead of this? And this costs more than this. And so it's a nice way of being able to illustrate, very, very real. Yeah, no, this is what the true costs are, right? So okay, all right, well thank you very much everybody.