 Hi everyone. I'm John Hopper. I'm one of the lead engineers here at Rack Space in our cloud integration group. We deal with tying all of the cloud systems together. And in a recent initiative, we've been trying to deal with the problem that everybody seems to be dealing with. And that's the question of how do we log a large number of systems at scale and do it tentatively, right? Part of our drive at Rack Space is to build solutions for OpenStack in ourselves internally, but also to relay that business value to our customers. And so to get us started, I'm introducing our projects called Project Meniscus. We call it Meniscus because the Meniscus is a type of lens that's normally found in your eyeglasses. And so we call Meniscus the focusing lens for logging events. So let's start off just defining what a log is. Traditionally, logs are static files that live on a local machine and they contain messages that have been written to them by services that run on the system. Usually these messages are intended for the system operator. They communicate the internal state of the application. They communicate when things have gone wrong, when there may have been a damaged state, or maybe your client's just hammering the system with bad credentials. And so the overall idea is that logs are very important, right? The information that is inside of these logs is hugely, hugely important to the operation and maintaining the health of your systems and your cluster as a whole, as well as uptime for your customers. If you ignore your logs, chances are you'll ignore a problem that's gone unnoticed for weeks, if not months. And you may have lost customers, you may have lost data, or even worse still, you may have lost entire instances and not even known about it. And so really we want to capture these events. But large-scale log managements are very difficult. We have applications in our stacks these days, like web servers, that produce thousands of log lines a second. You're looking at a single box in your infrastructure producing maybe 10 gigs, 50 gigs, 100 gigs of log data per day. That's very difficult when you have to scale out your log management system over, say, 50 web heads, 50 times 500 gigs. That's a lot of data per day. But it's all about latency. If you're collecting all of these logs and you aggregate them and you ship them, that's not too difficult to manage, right? You can always compress them, tar them up and send them off to an offline repository and investigate them later. However, if you're dealing with an issue in the now, then that's not very helpful to you, right? It's very opaque because you have to go through all of these logs. You have to manually correlate between logs that you may not even know need to be correlated. And so really batch processing, in my opinion, is too slow. It doesn't really fit our needs. I want to know when something goes wrong, and I want to know as soon as possible. In addition, I want to be able to correlate that later with other events that have happened maybe a week ago, a month ago, two months ago. There are a lot of use cases for logs, but managing them and shipping them isn't just the whole story. And so we built meniscus for large-scale log management. We decided to take a look at the problem as a whole and investigate a number of different solutions. When we started this project, we kind of started looking at the usuals, right? Why not gray log? Why not log stash, scribe, flume? This problem has been hammered through so many times, and in fact, this log has been around for at least two decades. So why a new project? Well, the problem is that different users have different needs. If we build this as a tenanted solution, I may have a financial user who has a restriction on their log data that says that they can't store anything unencrypted on third-party solutions. Well, if you're a cloud provider, that's not good, because that means that they can't use you, and you are the third party that they x-off their vendor list. And so all of these things, multi-tenants, encryption, you've got durability, a customer may not want their messages to ever die, even if your systems fail, compression, and a whole bunch of other needs. And so really, log processing is a diverse problem-scape. And unfortunately, a lot of these systems, they only deal with one or two of them, and none of the existing systems are tenanted. I can't have multiple users specify different filters or different aggregations, different endpoints, different encryption keys and such without building a separate infrastructure for each customer. Simply put, a service oriented system must address these needs directly, and that was the impetus for designing meniscus and really kind of going down this rabbit hole. So we decided to start with a clean slate, and really investigate the architecture as a whole and what it meant to provide logging as a service. And a couple of tenants kind of came up in our discussions, and one of them was that we must change client architecture as little as possible. Logging has been around ever since we've had data sitting in memory. So we really didn't want to have to force customers to install our agent. We didn't want to have them modify their infrastructure just to utilize our service. In addition, since logging has been around for so long, there are innumerable standards built around this. You look at the RFC tracks and there are probably about 100 related to logging just specifically. So it didn't really make sense to build our own standards, to build our own schemas and stuff like that. We wanted to reuse as much of what was out there as possible. Another thing is that it had to be performance oriented. If we wanted to build this as a service for multiple customers, well, on one side, we could have PayPal producing two terabytes of data a day. And then we have on the other side of that spectrum, let's say a handbag website that only produces maybe a megs worth of log data per day. And so really we had to be able to tackle the large and the small at the same time. And lastly, and granted this is kind of a given, we're all talking about cloud and operations at OpenStack these days, the solution had to be completely automated. And when I mean completely automated, we wanted to take a hands off approach on developing and building out clusters of this system. And when you talk about building out clusters, you talk about things like Chef Puppet, these orchestrations that you build up. But we wanted to take it one step further with this system and really hammer down what it meant to be automated. And so as we were kind of going through this, we were looking at technologies and we decided well, OpenStack is built in Python. And there are a wealth of libraries in Python, there's a wealth of information on Python and implementing on Python. There are books, tons of books and just this this ecosystem that was built around this language. And we want to leverage that. So we decided to try and build meniscus on top of Python. But wait, isn't Python kind of slow when it comes to really low level processing? I mean, if we're dealing with event streams that may produce thousands of events a second and they have to be parsed byte by byte, or they may have to be mapped. Python is traditionally not very fast at these types of problem sets. And so we decided to do a couple of research initiatives related to just how fast Python can process log data. There's a project out there called Logger Glue and that's where we started. Logger Glue is a syslog parser that follows the latest syslog standard, which was introduced in 2009. The standard is a step above the previous syslog protocol because it introduces something called structured data, which is an interesting way for systems to communicate all of their data fields without having to deal with that big monolithic log message that we're all used to seeing and chopping up with grep and awk and said and kind of pulling our hair out. And so we decided to write a couple of parsers building on top of that. Logger Glue gave us about 300 messages a second. Our second implementation was built on top of the regex engine but had a caveat. It required the whole message to already be in memory. And so it was more of a thought exercise, but we were able to extract about 10,000 messages a second. These messages all had a static length of around 256 bytes. A pretty short log message, but kind of typical if you look at most Linux infrastructures and how their syslog daemons push out messages from the kernel audit, things of that nature. Then we wrote a pure Python parser. And it turns out that the pure Python parser, it was a single byte look head parser, performed worse than the regex parser. And that got us thinking. We thought, well, this algorithm is sound. We even ran it in PyPy just to make sure that we weren't insane. And it turns out that PyPy got us almost all the way there. 55,000 messages a second for one process with messages around 256 bytes. But there was a problem with PyPy and that it's not orthogonal to OpenStack. OpenStack uses a lot of libraries that have C extensions and not a lot of them play very nice with PyPy. So while it was a good thought experiment, it really wasn't something that we could standardize on. That's when we found Cython. Python as a language is very close to the system. It has a very thin layer on top of all of these rich C libraries. And so we decided to reach into that world and see just what we could make Python do. And turns out a Python can do quite a lot. With numbers like this, we determined that we would be able to, with one machine, process, actually process all the raw syslog messages from our Nova installation about six times over. And that was pretty impressive, because this was just one process. One process on a box that has eight cores. So depending on how big your box is, that number goes up immensely. Then we thought to ourselves, well, oops, you know, hey, Python may just work. We answered the performance question and that was really kind of the crux of a lot of our research and development, because if we could make Python perform, well, Python gives us all these wonderful tools and access to really strong ways of describing this data and making it accessible to your sysadmins or your developers without having to dig into cryptic C libraries or go into other languages that may not be orthogonal to your deployments. If my presentation works the way I expected to. So after we figured out that Python could fit the bill, it could really perform to the point where we weren't too worried about scaling the processing of the raw data or attention shifted to making the system scale out. We talked about customer concerns earlier where a customer may require encryption. They may require compression or a number of other features. And we wanted to have a system that could scale out each individual responsibility as much as we needed to. So that's where we started designing this thing called the grid. Grid is a very ambiguous term, but really what it is is a collection of boxes that just route to each other. We designed this system in such a way that the message itself determines or tells the system where it goes and the system determines based on the message content which layer it needs to go to next. You have your tenants and when they configure their, we call them event producers, they're basically the event streams of the individual logs. They can tell the system, hey, any message from this event stream needs to be durable. It needs to be encrypted and it needs to be compressed. And I want it stored in cloud files. And so what we did is we took each responsibility and we broke it out into a piece in the grid. We call these pieces workers. So a grid's made of workers and they all have one responsibility, right? One worker may do compression, one worker may do encryption, another worker may do some type of correlation, another worker may emit events to an alerting system. And in this way, we're able to stream process all of these events, but still get our real time notifications, our batch job, excuse me, bagel, our batch job rollups. But it wasn't scalable, right? If you just build a chain of machines that forward messages to each other, how do you scale that? We went one step further with our ideals and we decided to make the worker completely stateless. And that led to an interesting organization. We call our layers service domains. We group all of these stateless workers together in layers that exist within the grid. And when a message is routed, depending on what the message needs next, it goes to one of those layers. These layers can be comprised of two nodes or they could be comprised of 50. It really doesn't matter to the upstream worker because all they're doing is handing off the message once they're done with it. So now we have this really powerful way of abstracting that horizontal scaling and maintaining that streamability in the system. When we route between different service domains, really what we're doing is we're enhancing the message, we're processing it, we're doing something to it. There's palpable work being done. And if we scale that out horizontally in each individual layer, then really what that means is that if we need to handle more encryption, well then I don't have to scale the systems that handle compression. There's just no reason to. So it's a very efficient way of designing the cluster to maximize the utilization of our resources. It's a great idea though to have this really big cluster of responsibilities, but how do we manage something like that? We kind of looked at how chef and puppet do their orchestrations and decided that some of those orchestrations may be better lived within the cluster. We wanted the cluster to know about itself. We wanted to be able to have the cluster route around damaged systems, be able to route service domains around the damaged portions of the cluster, be able to tell when load is hitting a peak or when a system has not enough disk space to continue caching messages. And so we decided on this model of pairing. Workers when they're instantiated, when we build up a box, we can call it a worker. We have an image and this image has only one thing on it and that's a tiny configuration. That tiny configuration tells the worker who it is. We call these personas. A persona is basically just a way of describing the role of the worker. What's really interesting about this pairing process is that this pairing process is secure. We support authentication right now with cloud auth and the handoff of those credentials isn't done by configuration on disk. In fact, when chef finishes spinning up a node, what it will do is it will actually make an HTTP post to a service belonging to meniscus, the worker itself, that listens on local host and only local host. When this HTTP request is made, it informs the worker of who it is, how it authenticates, and where to go to authenticate and become part of the cluster, a working piece of this large orchestration. Sorry. There we go. This bootstrapping process gave us a lot of flexibility. It meant that we could use one image and have it cater to every single node within the cluster. It meant that all of our orchestrations built in chef, all of our cookbooks, we didn't have to build a separate cookbook for the encryption nodes versus the compression nodes. That was lucrative to us because it minimized the amount of DevOps work we had to do while still making the system pretty smart. Now that we've built this big system, how do we maintain state? We're building all of this intelligence into the cluster, but we don't really know how to disseminate a lot of that information. We decided that somewhere along the line, a customer may have the requirement that they need to front load all of their encryption. For compliance reasons, they may not even be able to transmit unencrypted logs to us. That was kind of a deal breaker for a lot of the large customers. We looked at the architecture as a whole and decided that workers would periodically contact another type of worker called a coordinator. Coordinators are responsible for managing cluster state, and this can be done through polling. Every time a worker decides that it's time to see whether or not cluster state has changed or its role has changed in the cluster, it can make a get request against a coordinator. All coordinators are backed by MongoDB, which is really kind of unique, because that means that any actions that happen on one coordinator are immediately visible once the right commit is confirmed to all the other coordinators. In fact, talking to any one coordinator in a cluster is the equivalent of talking to all of them. This was a really powerful way of synchronizing the cluster, but it had latency issues. You can poll as quickly as you want, but in the end, all you're doing is wasting bandwidth. We had that one solution to support external customers, but for the internal solution, we decided to come up with another worker personality. We called the broadcaster. The broadcaster works in conjunction with the coordinator. When a coordinator detects that something in the cluster has changed, whether a system is down or whether a request from an admin has been made to expand one part of the cluster, the coordinator has the option of introspecting the cluster and seeing whether or not it has the ability to forward notifications. These notifications are handed off to a worker persona called a broadcaster. The cluster, by all intents and purposes, can operate without a broadcaster, but what it really affords us is that for all of the systems under our control, we now have an ability to maintain cluster synchronization within a couple of seconds rather than pulling period of minutes or tens of minutes. There we go. So where do all these events go? Now we've got this big, big cluster that processes all kinds of data and does all kinds of things. But where does the data go? Well, we weren't really that interested in where the data goes, to be honest. Our job was to collect these in a uniform way, process them, build structure into these log messages, and then hand them off. And so really, we want to let the downstream systems do what they do best. And we have systems like Elasticsearch and Mongo, Logstash, all of these really complex scalable systems designed around indexing this data. So our job is to feed these systems, not supplant them. So without further ado, I have a interesting demo where I will be showing you guys a full cluster. So what I have here is a console that's connected to one of the storage, it was connected to one of the storage workers. And now we are reconnected to the storage worker. And just for giggles, what I decided to do was set my laptop up here. This is a Linux laptop as one of the tenants. So it's going to be committing messages through the system. What we have up here is the endpoint. This is where all of the data for this particular flow goes, and it's just a MongoDB store. So what I'm going to do, oh goodness, is connect to Mongo and kind of show you what's going on here. And my logs, there's nothing in there. You don't see anything in the collection, because I haven't started committing events yet. I can make this just a wee bit bigger for y'all. There we go. Grab the, uh, so we have a worker that hosts the syslog parser that we built. And this worker, I've bound my tenant to it. And so when my system publishes events, I'm going to give this guy a restart, because it's probably wigging out. There we go. When my system publishes events, what's going to happen is that they will be intercepted by this worker. And this is the syslog worker. So I'm sending standard syslog messages to this worker. And this worker's only job is to take that message, parse it as quickly as possible, and chop it up into this nice JSON format that follows a schema standard called common event expression. Common event expression is a collection of XSDs that have JSON representations that's backed by a number of vendors for actually expressing logs in a standard way. We found it very lucrative to utilize because it turns out that the default descriptor for CWE, uh, actually follows syslog. So all we had to do was parse it into this basic template. And suddenly we had access to all of this, this predictability that was provided by the schema itself. So without further ado, I'm going to unplug my laptop. You hear it kind of complain a little bit up here. Oh goodness, there are no logs flowing through. Let's see what's going on. I wonder if our syslog's even on. I'm just restarting my syslog name on to see if I can force it to play nice. I'll connect back to Mongo. I am not entirely sure why I can't see logs flowing through the system. I apologize. See if I can shed some light here. Old school debugging, right? Uh, I sincerely apologize. This was working this morning, but I am afraid that it's just not connected. So I'm not going to waste anybody's time on that forever. I am more than happy to do that. What you should have saw was the messages flowing from my system to the syslog endpoint and then that system, what it does is it talks to the coordinator and asks, of all of my downstream routes, which ones should I be interested in? And as the message comes into the syslog node, it'll look up its routing table and decide which service domain to route to next. And then it would commit that to Mongo. So unfortunately, that doesn't seem to be working the way I'd hoped it would. One of the things that we've done for Project Meniscus as well, which I will be more than happy to show you all, is that we are completely on GitHub. So you can go to GitHub right now. All of our code and documentation is out there in the open. We've been designing the system with the idea that we had to be as close to open stack as possible. And really, that was the driving focus for making this as open as possible. So when you come to our repos, in addition to all the code being there, we've actually tried to be really, really on top of making sure that our documentation is up to date and useful. And so in our wiki, the other repo, yes, portal, by the way, is the syslog relay that we built. Claim to fame, it's the fastest Python syslog parser out there, and it supports the latest standard, which gives you unlimited length messages. We standardized on TCP and octet counting simply because one of the biggest complaints we've gotten from developers about syslog in general is that it cuts off messages. And that's actually a relic of the older UDP standard. And so we've moved towards TCP and long live connections simply because it's more efficient at streaming very large messages as well as smaller messages. But all of our documentation is up here for anybody to come and take a look at. We try to host all of the more important things up on GitHub as soon as possible, including our architecture documentation, so you can actually go through all of our docs and figure out how we made the cluster work, how we actually coordinate and manage synchronization state, as well as the first couple of processing aspects of this system. That's it. Any questions? Yes. So that was something that was a little interesting is that as much as people want to log events, they also want to filter them. The design that we chose for the grid allows us to do some really cool stuff in terms of filtering. If we dedicate a whole service domain to filtering just messages in general, then technically filtering, it's very expensive compared to a lot of other things compression, you don't really need to comprehend the data and so forth. But in filtering, you do. So filtering being a process intensive operation, the nice thing about the service domains is that we can scale that horizontally and independently of any other piece of the architecture. So that was something that we wanted to be able to support. And while we don't support it right now, that type of use case is definitely something that we've been looking at and targeting. That type of intelligence, we've been looking into tracking events as they come in by assigning them unique job IDs. However, that tends to be very expensive. And so it's really more about your need. If you do need to do real-time correlation, then all you need to do is build a system that can comprehend that and add it as a service layer. We didn't tackle that immediately because it's not easily scalable nor generic enough that we felt comfortable handing that off to customers. And so much to one of the slides, we've decided to hand that off to systems that are a little bit more in-depth like HDFS, log stashing, for example, elastic search leucine, that sort of thing. Correlation after the fact. However, it's definitely not outside the realm of possibility. Yes, we have. Unfortunately, I'm horrified that this demo did not work because that was a full end. And we had two service domains defined, one for parsing and processing the syslog messages into a nice uniform JSON structure and then handing that off to a system whose whole job is to commit those messages as quickly and efficiently as possible to Mongo. But yes, it's all P2P. We looked at queues. Unfortunately, a lot of the central broker ideas that you see in many of the queues rubbed in queue queued but really didn't fit our problem domain. We had to be able to move data as quickly as possible and any communication back and forth was really, it would be very difficult to scale and it was also very difficult to orchestrate within the cluster. And so we decided to take on that a little bit of complexity within the coordination of the cluster itself by letting the workers understand what's downstream of them and then route to those nodes within the service domain. And much like syslog relays of old, we just do fast failover. If we detect a failure, we tell the coordinator, the coordinator may have some logic where it'll inform other upstream nodes that, hey, this guy's bad, don't use it. And then just failover to the next node within the service domain because all of them are stateless. It really doesn't matter which one I route to as long as I get to one of them. What I didn't go into was the actual tenant API itself, which is how you configure this system. When you design log flows, you specify a host and you can specify a multitude of hosts and you identify their event flows by their process name. So in syslog, if you're familiar with the spec, they have a app name field and that's what we're kind of piggybacking off of. So when you design your flows and how you want them to be processed, you talk to this REST API and tell us, I have this event flow, it's identified by this app name, process it by encrypting it, compressing it and making it durable. And so then the system, the cluster, since it has that intelligence, can build those routes intelligently and then the message, when we introspect it, we correlate it correctly and then identify where it needs to go through the cluster. I see. The coordinator is aware of cluster load. All of the workers, they talk to the coordinator at defined intervals and so the cluster may identify that one of the nodes, its disk base, is running out very fast and so the node can flip itself into what's called draining status and when it starts to drain, it will no longer accept work. The coordinator is made aware of that transition and then that coordinator will make the necessary config changes within MongoDB and then broadcast to the upstream nodes. You need to synchronize with me. There's something wrong down below. More or less, it's fast fail over whichever one works since we're all stateless. I pick one, pick one, pick one and I continue to route. Right. Luckily, we've had a lot of precedent with large syslog installations. They have done hundreds of store and forward nodes where fast fail over is the only balancing option available to them and they perform pretty well. Usually, when you have cluster failures, that's when these algorithms become more important and that's what we've been focusing on is making sure that we can route around damaged events. So the nodes themselves actually route to each other. It's a P2P system. So right now, all we're doing is streaming the JSON messages over TCP. We're doing it as fast as possible. We're looking at supporting either Message Pack via 0MQ or some other efficient binary transport. However, our initial performance numbers have been positive enough that we've been able to push that kind of to the back burner and concentrate more on the cluster orchestrations. It's not that it's not scalable enough. It's that the difficulty in scaling those systems in our eyes was just too much complexity to take on. We had to have business value as soon as possible with as few dependencies as possible and central broker topography for this type of system where you're processing events in basically large streamed pipelines. It wasn't very lucrative for us to go down that road. So far, all we've done is really index them. We use Elasticsearch. One of the nice things about this system is that we deal with structured data and only structured data. So all of the inputs, all of the sources of this system, whether it be syslog, anqp, what have you, we enforce in the system that you put structure to that message before you submit it to the grid and forward it in. So all of our data stores, we tend to standardize on document stores because of that, simply because Elasticsearch and its scaling story is really powerful. And if all of your data is structured, then you can build very rich queries and do a lot of correlation very quickly. We decided not to tackle the analytics side of this, mainly because there are much better solutions than we could probably ever come up with in that realm. In addition, the system is smart enough to be able to multiplex messages. When we route to different service domains, a message may require long-term storage in which case we would route it to a long-term durable store like HDFS. However, you may want real-time indexing and searching or you may want near real-time latency of a couple of minutes. Well, we can take that message and play it to the HDFS service domain and have that persistent HDFS as well as play that message to the MongoDB storage back-end and have that forwarded to Elasticsearch via the MongoDB Elasticsearch river. Yes. Two questions. Kind of funny to say, thanks. A lot of the reasons why a batch processing wouldn't have worked, a lot of people are doing Storm and Kafka queues for that sort of processing is the reason why you chose not to do Storm and Kafka and I have a follow-up question on that. The scaling story that we could provide for ourselves by managing the cluster ourselves and having P2P was that it allowed us to really utilize and saturate the network far quicker than having a broker field those messages for us. It's more about cutting out the middleman simply because it doesn't fit within the domain of a streaming pipeline as we saw it. We add intelligence to the system by adding layers that you can route to and then replicate messages to. And for an analytic pipeline you wanted to route that to a Kafka queue or whatever any other sort of queues to put that in there. Exactly. AMQ. Right. So that kind of brings up the relates to the second question as far as we have a large volume of data at some point we see we're going to have to switch from a JSON format and a lot of our event streams not just this log. There's other application logs other non standard events that we want to put in and do whether it's derived events or derived metrics and at some point probably looking at the need for some other sort of serialized data format not JSON. Have you seen a limit to that sort of thing? Or... To... For that yet? For most of our use cases the performance numbers have been very promising to the point where we really haven't dug into a lot of the alternative serialization standards but in that realm Avro is actually the big one that we're looking at for translating those pieces of information into systems like HDFS where Avro is a much more succinct way of storing that data because the schema comes with it rather than just dumping the raw JSON. In addition JSON has a lot of control structures that are very beneficial for transport but not necessarily storage which you'll see in MongoDB they use Besong for that reason as well as some amount of static typing which is also very lucrative if you're doing structured queries and that sort. What about for the non-SysLog events? Are you playing on adding that as something or just purely focusing on the best way to do SysLog right now? So actually we have two inputs into the system right now. Our primary input is SysLog simply because that gave us the widest application escape. However we also we're looking at the mobile space and how mobile communications they can't open TCP connections for very long. It's very inefficient for a phone to keep that connection open. So what they tend to do is use HTTP to transmit those messages and so we actually have an HTTP input that accepts JSON plus CWE which is the specification that we standardized on for formatting the messages. So to that point HTTP and SysLog are the current inputs but we also have plans for integrating with AMQP to consume NOBA events. We also have plans to integrate with oh what was that? I apologize it escapes me but the system the way that the service domains are designed they are flexible enough to accommodate the upstream provider and that's really the goal here is that we wanted to be an integration piece in moving this data to the places where it needs to go. In addition if the message doesn't have any like definite structure for example it's not a SysLog message it's not a host message it's just a blob of data. We also support that as well. It's a little less efficient because you're indexing something that doesn't have any structure but the system to the system it's just another stream of data and the point to that was that if you wanted to encrypt say configuration files that you wanted to communicate when an event happened. What's the state of the system? Well let's transmit all the config files but we encrypt them so storing them isn't dangerous anymore. So those kinds of use cases we're also trying to target as well. We did look at ZooKeeper unfortunately it's place within the architecture wasn't very solid. We had a couple of issues scaling it in our earlier tests. In addition one of the tenants of the meniscus project was to make everything on top of Python. We tried to be as Pythonic as possible to maintain a low barrier of entry for integration within systems that may not have Java installed for example. That was really kind of our goal there is to be as orthogonal to the UNIX system as possible. So while it could have answered many of our problems really we decided to take on some of that coordination ourselves simply because it gave us a lot more power and control over how we integrate with other systems. It's not just the Java argument either it's ease of use and to that point I want people to contribute. I want as many contributors as possible and Python is an excellent language for that and that's why we spent a lot of time really researching making Python fast was that it was very lucrative for us to have sysadmins committing code back to the code base in a language that they're very used to and maintaining that was was imperative in our opinion. Wasn't looking at any more questions so thank you very much for coming. Oh we're actually getting ready to build out our first development environment where we will be consuming live logs from Nova rack spaces internal Nova we're looking at a Q2 date for that basically we want to have the beta ready for internal consumption by the end of Q2 so unfortunately the demo didn't work but the demo would have would have shown you how far we'd come along in this in this endeavor but yes we are planning for rapid implementation and hopefully Q3 is when we'll start hitting beta customers so thankfully MongoDB is quite efficient at what it does which is a distributed data store that maintains consistency and replication and so really we built the coordination system of the cluster on top of that and so with very few coordinators you can manage a cluster of around 200 nodes the test that I went through we had 20 nodes spun up and we were able to use just two coordinators too for high availability without any problems however our use cases extend into clusters ranging the numbers of thousands of nodes and so that was also the impetus for us making sure that we could maintain cluster state very efficiently and very cheaply in fact most of the nodes within the system only use around 38 to 50 bytes to communicate their state or receive the state so when a customer is provisioned within the system depending on their needs depending on their event flows we will give them an endpoint that they can publish to we give them either a syslog endpoint if they choose syslog or if they choose the HTTP input we give them that endpoint as well and so for the integrations that we have when you provision your event flows we will then hand you off a endpoint that you can publish to and that'll be a load balanced VIP that will then just forward to the service domain in question well for syslog it's a TCP endpoint for HTTP it's obviously TCP HTTP we will be supporting TLS as well as SSL for both of those endpoints so we're we're trying to enforce end-to-end transport encryption within this cluster as well which was another reason why it was very lucrative for us to just build everything on top of TCP and stream it TLS kind of gives it that security for free because it's a nice abstraction that we don't really deal with it's dealt by the socket layer yes or in the case where a customer need to front load encryption they will have a piece of meniscus that is talking to rack spaces systems for coordination and synchronization right they can actually host part of the meniscus cluster in their infrastructure and still have that tight integration because all it's doing is just forward into the next service domain any other questions and thank you very much for coming I really appreciate the turnout