 really good chat. I'm really excited to have Ryan King here with Comcast. It was a big news week for OpenStack with Comcast coming on board and I think this is one of the few talks that was like actually referenced in news media this week because they were looking for indicators about Comcast being involved. So it's also very cool that we're going to have some fresh, newly released open source. So I'll pass it over to further introduce the topic. Thanks Ryan. Thank you. Yeah so my name is Ryan King and we're here today to talk about a project that we call CMB which is an open source alternative for queuing and notification systems and really this talk is for anyone who's considering whether OpenStack is missing components in this area and in which case what we've built might make sense to integrate with OpenStack or anyone who's building applications on top of OpenStack and looking for something to complete the offering that somebody like AWS has. And so again we're from Comcast Silicon Valley. We're a small team up in Northern California that has built and now we've open sourced this project. So I'm going to just mainly talk through what it is that we've built, how it works, give a little bit of insight into how we're using it at Comcast and then field any questions that you guys have. So really this is about giving information to you guys, getting feedback from the community. And so for those of you who are expecting to see Matthew Perry up today since he plays the character Ryan King and the TV show go on, I'm sorry to disappoint you. I'm not him. And what I am is head of technology unit for Comcast Silicon Valley. Although some of my team is here today as they'll tell you they do all the work and I just stand up here and take the credit. So that's how I think of myself but really that's probably not how they think of me. Probably not so many tools in my toolbox. And so summary of the talk here today is our team inside of Comcast has built and open sourced a compatible version of Amazon's simple notification service and simple queuing service on top of Cassandra and Redis, which are obviously two. So that's what we're going to talk about today. So who are we? We're actually a small startup that was gobbled up by Comcast a couple of years ago. We build consumer facing internet products for the company, mostly on web mobile and a little bit on TV. I'll show you a little bit towards the end of presentation. Some of the work that we've done, which is powered by the system that we've built. And so let's talk about it. I talked about it as CMB. What does that mean? Well, CMB stands for originally Comcast Message Bus, but now that it's open source, you guys are welcome to call it the Cassandra Message Bus. And it really is composed of two parts, a queue service, which is modeled after SQS that we call CQS and a notification service, again, modeled after SNS that we call CNS. So I'm going to talk in detail about these two parts. Now you may be asking yourself, wait a minute, why did you guys build your own? Isn't there stuff already out there? And I thought this graphic was particularly suitable for this crowd here today, given that that's what OpenStack does. And so I'll talk through some of the reasons why we decided to build our own, rather than using something that's out there already. And so as you may know, some of the announcements in the last few days, Comcast is actually working on building out its own private cloud, of course, using the OpenStack. A few of my colleagues are here working on that effort. And this internal private cloud is going to be powering all of our next generation services, including our next generation TV platform, which has already started to roll out, which we call X1. I'll get into a little bit more details of what that looks like and how that works towards the end of the presentation. And for us, latency is extremely important. Every millisecond counts for us. As you can imagine, if you're powering a TV service, at every click on the remote, it's going out to service in the cloud. You don't really feel like waiting 100 milliseconds or 200 milliseconds to go out to something like Amazon or anything else. So we're really trying to optimize extremely for latency. And of course, it has to be cost effective for us and kind of scale that we run. And we're going to offer this service. We're talking about tens of millions easily, probably 50 million users. And so we did take a look at what else was out there. There are various queuing systems, message delivery systems, here are some on this board that we took a look at. But unfortunately, none of those really can meet any of our requirements. So I'll talk about why. The requirements that we had primarily were we wanted compatibility with Amazon for a couple of different reasons. Number one, it gives us the ability to do elastic public-private hybrids if we need to. But number two, it also, we kind of like the model that it forces applications to be built into. Some of the trade-offs of availability versus robustness are we think are the right ones and the right models for how to build applications. So we sort of like to model our stuff after that or open stack. And one really big important requirement for us that we found the most trouble with something existing already meeting was sort of a hot, hot, you know, active-active multi-data center situation. And so we really want extremely high availability, you know, with some of these queuing and notification systems that are in the cloud today, for example, with Amazon, they have availability zones and you can put a message in a queue in one availability zone, but it's not accessible if that availability zone slash data center, you know, bombed. So we actually, we want to sort of raise the bar in terms of availability on that. Of course we need horizontal scalability. So we need more messages per second through the queue. If we need more notifications delivered per second, we just want to be able to add more nodes without changing anything else about the architecture. Specifically for our notification system, we need guaranteed at least once delivery. This is what many of these other things provide. That's an important requirement for us. We're not going as far as saying exactly once delivery. And that's pretty important to, that trade-off is pretty important in terms of the ability to have it to be extremely available and highly scalable. And as I mentioned before, extremely, extremely low latency. And we're probably talking about, you know, we like 10 millisecond response times in the average case and worst case of 90 or 100. That's kind of when I talk about really low latency, that's really what we're talking about. And so some things that weren't requirements for us that we took into consideration or didn't take into consideration were number two, duplicate message can happen. And number one, order is not guaranteed. Let me see what I did there. A little humor for you. And of course this is the same, these non-requirements are also true of Amazon's data guarantee that the messages you put in a queue will be delivered, will be taken out of the queue in the same order. And neither does our system. So let's dive in and talk a little bit about the queue service, how it works. So let me first start off, hopefully everyone's familiar with Amazon's simple queue service. The PR folks at my company made me take out as much as possible references to that company in Seattle. Let's just say from a popular cloud service provider in Seattle. The simple queue service offers a reliable, highly scalable, hosted queue for storing messages as they travel back and forth. That's what the simple queue service is. As you can see from this diagram, it is really that simple. There's queues that live in the cloud. There's producers that put items into the queue and consumers that take items out. And the main methods that are involved in the queue service are, there's a whole bunch of methods, but the ones we care most about for the purposes of today's discussion are creating a queue, sending a message to the queue, receiving a message from the queue, which actually just gets a copy of it and marks it as invisible in the queue, and then deleting a message which actually removes it from the queue. Those are the main methods that we're dealing with. And then now let's talk about our implementation. And so as I mentioned in the beginning, our implementation is built off of Cassandra and Redis. So we have a set of API servers which handle those API requests for create queues and send message and delete message and receive message and so forth. And all of the data in the queues actually persisted to Cassandra and I'll talk a little bit more in detail about how that's done in a second. And then Redis essentially functions like a cache to get the sort of latency that we like. So first let's talk about what we're storing in Cassandra. And basically the way we've done it is each queue is sharded across a number of different rows in Cassandra. We've picked 100 as a default, but you can tune that. And the reasons that we do this are we like to avoid having really, really wide rows in Cassandra. So if you had 500,000 items in the queue, we'd want to shard that across a bunch of different rows in Cassandra. Obviously that's tunable. Reduces the churn of what's happening inside of Cassandra. So for those of you familiar with Cassandra and think about it in the context of a queue, a queue is sort of by definition transient data. So items put in the queue, written to the queue, hopefully pretty quickly read out of the queue, lead it from the queue. So there's a lot of items that get created and destroyed fairly quickly. And so there's a lot of churn that happens. And by sharding over a bunch of different rows in Cassandra, we can help reduce some of the negative effects of that churn. And then of course distributing over rows helps distribute the workload over the entire Cassandra ring. So that's how we're storing data in Cassandra. Now let me talk about what Redis is for. And so basically Redis stores basically all the metadata for the queue, all of the visibility data, and a cache of the queue data that's in Cassandra. And so the message IDs are in a list. The visibility information is in a hash table and the payload cache is essentially just a set. So a couple of notes and I'll talk about why we did things that way. So under normal circumstances when things are happy, we actually have pretty good, we have very little duplicates being delivered in terms of getting the same item out of the queue more than once. And we generally have, that's generally a FIFO queue. The reasons that we don't strictly adhere to order and no duplicates is because when things start going wrong, when nodes fail, when consumers fail, things like that, those restrictions will essentially be violated. And then we have seen that even though we've done a lot to sort of minimize the effect of this churn that I talked about in Cassandra, that we will see some degradation in performance over time. And so we've had to do a lot of tuning on our own deployments of these services to get that. The good news is of course, if you're Intel from the architecture, a basic receive message which reads an item from the queue is handled entirely from the Redis cache. So we can achieve the type of latency we want for that normal situation, even one. So now the first, this diagram talks about some of the performance characteristics of the queue service. So in this case, you can see along the y-axis is latency, so the time, the response time of a receive message column. And the x-axis is the throughput. So we're trying to drive as many messages per second through the queue. And so this particular example was done with 10 queues and two producers and two consumers for each queue. And so you can see as we move to the right of this chart, we're putting more, pushing more and more messages per second through the queue. And all of this is done with a single API instance. But you can see even as we get towards 1,000 messages per second on a single instance, we're still in the lower end cases to the left of the chart. We're below 10 milliseconds and even in the, we're pushing it really hard. You're sub 100 milliseconds, you're about 80. And we believe that we can add more API nodes and simply scale this out so that the chart can keep going to the right without going. Let's talk about why we chose Cassandra versus there's lots of other storage systems we could have opted for. And some of the main reasons are the latency of rights, the throughput of rights, really the eventual consistency that we get across data center. So we can put something in Cassandra and eventually you'll show up in another data center so that if we lose an entire data center that item is still in the queue and still available for other consumers. That's probably the biggest reason why we chose it. There wasn't really a lot else out there that could give us the right throughput that we needed. I mean there are other eventual consistency systems but this one was the most robust for what we needed. And then of course we can just keep adding those to Cassandra rings. Then of course the next question you might ask is great well what's wrong with just using Cassandra? Why add anything else to the picture? And if you understand the way the queue service works, the message visibility which is to say when I take a message off the queue my market is invisible so no one else can see it. I processed on it for a while and then I deleted from the queue. That message visibility is often changing very frequently so it's boom invisible boom out of the queue. And it's not important that it be durable because if you lose the visibility data that just means that someone else will pick up a duplicate which as we said is actually well not the norm is okay. So taking this into consideration and also the consideration that we need really low latency reads that's why we decided in memory cache that doesn't have to be durable and we can basically trade off some durability in areas where we don't need it. And then of course I kept talking about churn and Cassandra but basically by having the visibility data only live in memory we're not having to do twice as many writes to Cassandra for the visibility. Now let's talk about the notification service. So that was the queue service. Now we're going to talk about the notification service. And again I'll talk about how our friends up in Seattle describe this simple notification service as a web service that makes it easy to set up operate and send notifications from the cloud. And basically the way it works you can see here it works as a set of topics and publishers can publish messages to topics subscribers subscribe to those topics and whenever a message is published to a topic a subscriber will receive that. You can have as many subscribers as you like in a given topic and there is a variety of different methods for transmitting that information to the subscriber the most popular of course being simple should you be post but you can do email and that's sort of for the purpose of this talk you can think about it as it should be post. And so the main methods in the notification service are creating a topic subscriber subscribing to a topic and publishers publishing from a topic. So it's that simple. Now let's talk about our implementation which we call CNS. And so this is the picture the most little bit complicated but the most important thing about this picture to note is the entire thing is actually built off of CQS that the system I just talked about. So at the top you can see those green bubbles are the consumers of the system the publishers and the subscribers. We have a set of API servers which implement the methods that I described as well as others. Cassandra is really just in this case storing the list of subscribers per topic that's actually pretty lightly used of course it's used internally inside of the CQS cloud as well. And then the rest of this the way this works is when a message is published to a Q or sorry when a message is published to a topic it's inserted into a Q which says we need to deliver this message to this topic. And this delivery producer figures out who's subscribed to this topic and then breaks and there might be you know 10,000 subscribers or some large number. So he breaks it into chunks sort of fans it out fans out the delivery for these delivery consumers to go and actually do the issue to be posted to all the subscribers. So this this asynchronous system using the Q is essentially just to fan out the work and do as much of it in parallel as possible. So we have lots and lots of consumers with lots of lots of delivery workers sending these notifications out in parallel when one event is published. The only other thing to note about this is we have these two different pools inside the delivery consumer. Really the reason for that is we're trying to take into account that there may be timeouts latency errors happening on subscribers since we can't control them and so we're trying to separate the efficient delivery of working notifications to working subscribers from the sort of problem children which we sort of stick over here in the in the delivery jail for a little while. So they have a separate pool of workers dedicated to the ones that have failed because we expect them to be more problematic in terms of timeout. So that's the basic picture of of CNS how it's implemented. Now let's talk a little bit about how this scales since obviously it's really important to us and obviously for if we want to increase the number of published requests well it's very simple we just add more API servers because all those guys have to do they're completely stateless all they have to do is receive a published message put it in a queue. If we want to scale the number of subscribers for a topic we want to have 10,000 20,000 you know and beyond we simply just add more of these delivery producer workers which are these background processes pulling things off of the queue and fanning them out and then of course we can publish endpoints more quickly by having more of these delivery consumers with more threads and these are just doing each and then overall of course the scale CNS we have to scale primarily CQS but also standard. So now some charts which describe how we think this scale so the first chart talks about throughput scalability y-axis being latency and in this case when we say latency we mean the end-to-end latency of the time a publisher publishes a message to the topic to the time the last subscriber receives along the x-axis is throughput so just you know number of messages per second. In this case we used a single topic with 100 subscribers and we were just seeing how adding more workers seeing if we could back up my claim I just made that simply adding more workers will allow us to increase the throughput while keeping the latency low and that's kind of what you can see with what's highlighted there in green. We've added more work we've doubled the number of workers and are able to effectively double the throughput without keeping the latency fairly low. So the next topic is on scalability is some performance results we've done to try to back up my claim that we can increase the number of subscribers in a topic in a scalable fashion and similarly we're showing the difference between in this case again it's end-to-end latency on the y-axis and number of subscribers per topic on the on the x-axis and in this particular test we just used a constant rate of publishing five messages per second to a single topic and then just ratchet it up as you go right in the chart the number of subscribers to that topic and see how their system could handle it and so you can obviously see as we get as we get start to get up and the number of subscribers to topic with only three workers our latency starts going way up we can double the number of workers to six and get a lot farther before it starts going up so in theory we can just keep adding more workers and get lines that look and all of this was done by the way all these performance tests were done with a pretty wimpy eight node Cassandra ring underneath it performance characteristics has to do with okay now you let me talk a little bit about so we've built these systems let me talk a little bit about what we're using them for we obviously have plans to since these are generic private cloud services of ours we plan to use them for lots of different purposes but i'll talk about a couple of ones that we're already using them for in production give you a sense of the kinds of things that that we might use these for and so the first one is our new x1 tv platform has this cool app that our team in silicon valley built that we call the sports app and you can sort of see it on the on the right side there and really what it's meant for is the hardcore sports fan who maybe is watching one game and wants to keep track of one or a bunch of other games and we're pushing updates to the scores and play by play in real time to everyone's tv so you can see you know there's kansas cities in the red zone you know castle just passed um and so that's updating that's just updating on the screen the user's not doing anything and it's updating in real time and so we use the notification system the cns to take every time uh you know giants make a score the miners make a make a play uh we push that out to all of the subscribers which in this case are users consumers who have this little side panel up on their screen here's just another example of that where there's multiple games so i might be into baseball and there's a bunch of games going on at once and i want to i want to watch one main game but sort of keep track of the scores of other games and in that case it's just the scores that are updating in real time every time i think so that's a little bit about how we're using it mostly centered around the notification service but as you can imagine viewing in notifications are pretty important parts of just about anything you want to do asynchronously so you know why am i telling you about this why this here well we've open sourced this effort so it's actually now on github so you guys are all welcome to check it out contribute to it and uh and we are really interested i mean the reason i'm here is to get the feedback from the open stack community and and and if other people want to use this or adopt it that's um that's great for us and then you know the questions i had that led me to this stage were should this be part of open stack does it fit in somehow obviously aws has these services as part of it uh open stack doesn't yet have anything i like it so there's sort of a hole there um wanted to get feedback from everyone and that's part of the open stack community um and figure out how to go from there um as you probably saw you know we're we're concast is now a contributor to the open stack and we're talking with various folks about foundation status we're in the close strategic partnership with Cisco to to collaborate with them helping us build our own private cloud uh using open stack um so we're really looking to get a lot more involved in in this community and and open sourcing this is sort of the first toe in the water uh towards that direction so those are my slides um do you have any questions feel free to stand up and ask thank you over here first it could none of its explicitly relying on this i'm sorry oh the question was uh is any of this relying on block storage from from open stack um no all of this stuff is uh application level code so it's all it's all java code that's uh relying on kassandra if you if you decided you wanted to implement kassandra with block level storage then the answer would be yes but it's it's essentially our code doesn't store anything directly it's using kassandra as the yeah so the question was have we thought about multi-tenancy um i think you know one of the things i mentioned earlier was you know having it available in multiple data centers right so we have the the availability requirement um and we do plan to have you know the same sets of instances serving many different applications across many different data centers that's a really good question though because we haven't um you know we are there's lots of good discussions about okay do we stand up dedicated instances of cqs and cns for applications so we understand the usage characteristics and stuff like that in a fair out of scale or do we just stand up one generic you know cloud that we scale out across data centers and let everyone use it sort of more of the model that that amazon does so i i don't think we've decided that um there's certainly lots of arguments on both sides how large have you built this so far how big have you gotten your clusters um so far like i said we're we're we're in two data centers each with eight node rings of kassandra with you know half dozen of the workers and stuff and and you know right now the the platform that i mentioned that i showed screenshots of is just in the process of being rolled out so it's in boston it came to san francisco you know we're in the tens of thousands of users range but going to quickly scale up to a million so the answer as of the moment is fairly small but pretty quickly the answer will be fairly big um how many messages what's our peak messages per second um you know we probably have some thousands you know per second right now so it's it's you know in the same range of parameters of what i described the performance test we haven't uh but you know we can easily imagine you know one of the other use cases we're using this for is we do all the dvr scheduling in the cloud so we sort of think about these crazy use cases where a tv ad comes on that says hey record american idol tonight push the record button you know you get 20 million people clicking record right then what do you do you know so that's sort of a great um extreme stress case for the queue right so we put the fact that this user wants to record american idol and queue schedules all in the cloud because we could have you know 20 million come in in a very short period of time so that's kind of what we're thinking about designing for but we're we're not at the scale deployed yet but we the question was does it support auto scaling um we haven't we've we've we've just built the services themselves and open source and that's one of the reasons that we wanted to open source it is because we haven't built a lot of the management and operational tools around it um and that's exactly what we think folks in the open stack community are are fantastic at you know how do you how do you build in a lot of that that automatic stuff so we really just built it as a as a service and starting to deploy it and realizing that we need a lot more rigor around tools and things like that uh we we still use we're using you know the the version of open stack that we're collaborating with Cisco on for our own private cloud stuff and so really this is this is the first stab at a at a service that could go alongside that whether we'd use that internally for how the open stack components communicate to each other yeah it's possible but we haven't haven't thought about it yet yes okay so the question was how what are we representing when we have these multiple layers of reddish we really just represent straight straight up sharding so we have a bunch of shards of reddish right now we're putting it's just a straight you know sharding off of you know hash of the q name we haven't separated but you could imagine sharding it also sort of horizontally where where we or I guess vertically where we keep the metadata caches separate from the data caches we haven't done that yet but so it's it's literally you know if you have a single q only right now be on a single reddish instance so we're sort of sharding it by by q if you have a thousand queues you could have them spread that's something we've thought about yeah and and we're not you know we're not explicitly again the let me back up a couple more which is the entire system actually operates if reddish is down in a degraded state where we're returning a lot of duplicates and the order is almost up and stuff like that but it actually works and and meets the hardest line of specification of how we're supposed to operate and it's probably a lot slower and so forth so it's really just a okay yeah sure I can introduce you to mine yeah so we I actually only I explicitly simplified this to only talk about these we've actually implemented the entire set so change message but all the I just didn't want to go into details of every single one and put it all on the slides the question was do we have full compatibility with the latest version of the amazon sqs spec and the answer is yes just for simplicity simplicity sake and we highlighted a few of the more interesting methods yes some features of our of our new tv service I could give a whole talk about that actually my my colleague Brian over there in the back of the room could probably talk for two hours on that um you know it's it's our next generation cloud tv service so the most interesting thing about that being all of the logic is in on the server side so we can a make changes very very rapidly and in traditional cable set top box tv systems the ui is how do I say this politely not very good and doesn't change very often not gonna piss off too many people um uh and with the the cloud based system that we've developed um all the code that we all the code that we write that runs the screenshots I showed us whether as well as the entire service the guide the dvr uh everything is all rendered in the cloud and just displayed on your tv so we can change it you know every day if we wanted to um we probably won't but but we can certainly change it much more rapidly um and the the thing that's connected to your tv is actually ip enabled in this new service which seems like an obvious thing but hasn't been the case in the past so there's that opens up a whole bunch of new capabilities but um from us from to more directly answer your question um some of the new features that our new x1 service has you should go and check out comcast.net slash x1 and give you a quick overview of of what's in there but and of course it has a guide and a dvr and some of the things you would expect but it's a complete replacement for the cable set up box tv service you would have in your house today um and then has a whole bunch uh more on top of that we have iphone apps that you can pair it with which lets you use gestures control your tv we have apps on there facebook pandora etc so a whole bunch of stuff any other questions thanks a lot guys thanks ryan i love love seeing uh open source projects uh based on at scale uh real use cases so i'm really excited by comcast contribution so it's uh lunchtime now and i believe it's at the the far end where the breakfast was i think at uh 115 there might be lightning talks in here so uh make sure you get your food and then come back in here for the lightning talks and then this track will continue in this room uh thereafter