 Alright, so we are going to get started. We've got Adam Wallencuff from Esri. He's going to talk about applying geospatial analytics at a massive scale using Kafka Spark Elastic Search on DCOS. Alright, thank you Ravi. Can you guys hear me okay? We're all good. Alright, so I'm Adam Wallencuff. I work at our company called Esri. We're based out of Redlands, California, which is just like an hour and a half east of here, or if you traveled yesterday morning, three hours to get here in the traffic. So we're here local. I'm going to be talking about applying geospatial analytics at massive scale, talking about some of the things we've done at Esri. And I've got a really cool companion GitHub site along with some of the things that you'll see here where you can go to learn a lot more than what we'll be able to cover just on this 40 minutes that we have today. So I'm responsible for the real time and the big data capabilities at Esri, so this basically means the IoT strategy that we have as a company, bringing in sensory data and other things. Esri as a company, in case you're not familiar with us, we've been around since 69. We're one of the largest companies that do geographic software or spatial software, if you want to call it that. We pretty much invented geospatial information systems back in the late 60s. A lot of the original folks that did that are still with the company. We licensed our software to over 350,000 user organizations around the world. So we make geospatial software. This isn't really a session about Esri. I just wanted to give you a little bit of background in case you didn't know who we are. As far as the agenda for this discussion today, I'm going to talk about the emergence of a new class of problem that's driving us to have to take a new architectural approach from what we've done in the past with geospatial. We'll talk about our approach towards massive scale, which is obviously using DCOS and a number of the packages in the ecosystem of Mesa Spear. We'll talk about geospatial analytics. I'll give you a few samples of some of the types of analytics that we perform on this for our customers. Then we'll talk about an important topic, which is writing our applications in a way that they can be portable across multiple environments, if that's public cloud, private cloud on premise, and how we can use DCOS as the way to deliver those in a consistent way. This is a slide that captures the essence of what we're seeing in the space over the last few years. We have traditional customers that have been doing what we call real-time GIS, where they might track their snow plows, their police vehicles, their fire equipment, other things that are out there. Typically, a few years ago, we're only tracking one or two assets. What we're seeing now is we have cities like Dubai and Singapore and other places around the world that are pretty much tracking everything. These are a sampling of some of the things that our customers track on a continuous basis. Each of these signals comes from different devices that are out in the field. They report lots of different information, be it from an environmental sensor reporting air quality or water quality, be it weather events and trying to correlate that to things that are happening inside of a company. Basically, they want to be able to bring all of this information in in near real-time as quickly as possible, visualize this and have a situational awareness display of what's going on. Then they can use that as a common operational picture to affect different things. More recently, in the last couple of weeks, some of the hurricanes we've had people fielded down on site, tracking the water levels, tracking all kinds of other things, helping FEMA and other organizations to respond to those situations. With this bringing a new velocity of data, it's not just one feed of vehicles coming in. It's all these different feeds providing that. What we've traditionally done for customers is deliver a multi-machine system for these customers. This is kind of our old approach before DCOS. We would recommend an environment where people bring in and ingest this real-time data using our software, store this data in a spatial temporal way, spatial temporal being recording that data based on space and time and having it optimized to be able to access that data that way. Then being able to run ad hoc and scheduled analytics on that data or those observations that are stored. Then finally, being able to visualize that information either as a live display or kind of DVR, go rewind what happened an hour ago, what happened two days ago, what happened a year ago. This is a traditional deployment that we've had. What we're seeing now is with this emergence of IoT is that we're seeing a new class of customers that go well beyond that. They can't just use a few machines and process thousands of events per second. We have customers that need to do millions of events per second potentially depending on all the different feeds that they have. Just taking our traditional architecture and trying to deploy that across tens or hundreds or thousands of machines is not really a reasonable or a tenable thing to do. This requires a new approach for this. Our massive scale approach is DCOS. If you took the concept of a data center or a rack of machines in a data center and treated that as one logical unit, we treat this as one operating system, thus the data center operating system. What we look at is instead of scheduling our software to run on 11 different machines and we're very specific about this machine is for storage and this machine is for ingestion, we just treat it as one operating system where we schedule work to run. We don't look at it as 33 machines. We look at it as a whole bunch of resources that are available that we can schedule work to run on. So we have a lot of RAM, we have a lot of storage, we have a lot of CPU resources that we can make use of. Then we schedule work to run on that. This is basically what DCOS is all about and this has given us a different starting point for us to deploy these applications for customers. Treating the entire data center not just the 30 nodes is one operating system is quite useful. This is not new, this is something that's been proven and out in the industry for some time. These are a listing of some of the companies that have been using the underlying technologies and then DCOS as well as it's emerged over the last year and a half or so. So we're standing on the shoulders of giants that have proven this architecture and we're basically applying our software to it and we're adding Esri to this logo as well. So we have a project within Esri that we call Trinity. It's Trinity like the matrix character. I have the fortune of traveling all around the world as part of my job and when I talked about Trinity and Dubai they were like cool, like Christianity? No, no. Not Christianity and then I went to Germany and they were like, oh, like the atomic bomb project. Oh, no, no, no. So Trinity has turned out to be the worst name for a project ever but everybody knows it by that name. So I always qualify it with the matrix character now. So that's why we have Trinity from the matrix here. We entered a partnership with Mesa Spear about a year and a half, two years ago to work on re-architecting Esri software in a way that we can span this across a massive scale. And so that's what our project Trinity is about. So our before picture looks something like this where we had this triangle or the Trinity, if you want to call it that, of ingesting real-time data from sensors, storing that data, and then being able to run ad-hoc analysis or scheduled analysis on that after the fact and then being able to visualize that information over time. Our before picture and our score card for what we could do with our traditional software stack was in this range here where we can ingest thousands of events per second, which met probably 95% of our users' use cases. But we have that new classic customer that's in the hundreds of thousands or millions of events per second that we need to handle. And then we also, from a streaming analysis and from a batch analysis perspective, need to be able to handle lots more velocity, lots more volume and store many more than millions of events per second. So our after picture is we needed to move to something that looked like this, where we're satisfying that new class of customer. And for us to do that, we had to re-architect our approach. It wasn't just take our existing software, put a Docker container around it and deploy a whole bunch of them. That wouldn't really work too well. So we had to re-architect our software. And what we did is we went through a containerization process. And so instead of having a monolithic application, we broke it into small microservices. And so what was our ingestion path has been broken into things that sit there at the edge and collect data. We have Kafka that sits in between. We have Spark streaming jobs, which are the purple items, that write to storage systems, which is Elasticsearch. And then we have visualization techniques to do that. What this allows us to do is to scale up any aspect of the system based on a customer's need. So if they're a connected car customer, like you've seen on stage, they have millions of events per second coming in. So they're going to need lots of black sources to sit at the edge and collect these things and have lots of redundancy, lots of availability in those services. And then lots of Kafka brokers and lots of other things running as well. So we take our before picture, re-architect that into these containers, and we have our Project Trinity, which has broken that out into nice microservices. So these are a listing of some of the container types that we have. Underline technology underneath all those. As I mentioned before, our sources sit at the edge and they are responsible for connecting to some device out in the field. So we have probably 50 or 60 different sources. Everything from listen to a live weather feed, to listen to connected cars that are streaming in, to listening to a fleet of FedEx or UPS vehicles or things to connect to be able to bring in all those different types of feeds. So those are connectors or adapters for different device types. Those we write in Scala and we deliver those. So this is kind of the reactive application style approach. We typically write those events to a gateway that is underlying Kafka brokers. Those topics are available for Spark streaming jobs to consume. And then we have our geospatial analysis capabilities within those real-time analytics. This geospatial analysis is stuff that we've added to Spark. So we spent a lot of time over the last two and a half years to add user defined types for geometry, to add user defined functions for different operations for geospatial, which we'll go through in a second. And then our batch analysis is the same story with user defined types and user defined functions in Spark. We write that data to Elasticsearch. Elastic is really good at storing geospatial data, but we've extended it even further to do additional capabilities. And then we expose all of our data in our spatial temporal store with a play app that's a reactive base application, a web app that people can access and generate map images and visualizing, which we'll see a cool demo of in a minute. Our geospatial analytics, just to give you a quick sampling, we're going to deep dive into these two container types, is those Spark things that we're talking about. So it's Spark with Esri extensions that we've written. These are a sampling of some of the different analytics that we have. So I'm not going to cover all these, but we have capabilities such as aggregate points, where I might want to take observational data from sensors, and they may be moving sensors, and I want to record, you know, I don't want to put a million dots on the map. I want to be able to see the density of those things, and I want to see that in near real time as it's moving around. So the aggregate points would allow us to do that. We can also do things like join features. And so when we join features, it's kind of like a relational database join, but it's joining with space and time as well. So which of these dots are in these polygons, and which of these dots are in these polygons within this time window? So being able to do spatial temporal joins. There's a lot of other analytic capabilities that we're not going to go through in this session, but this is just a sampling of a few of those. So a couple of real world concrete examples of these analytics are like which crime events occurred near sporting events spatially and temporally. So that might be a question that somebody wants to ask to do that. Or what bodies of water intersect cities with populations greater than a million people. And that million people could actually be not demographic data, but real-time cellular phone data that's coming in. So I have a million people here right now because I have anonymized cell phone data feeds that are coming in. So that can work with real-time data or static data or data that's changing on a continuous basis. Or another example is what traffic jams occurred because of car accidents. So how can I correlate these things? What was the weather like when those car accidents happen? So how do I correlate all these things together to do that? So this is if we go into that tool a little bit more what a user specifies is what type of input they have. So it might be let's say cell phone data. And that cell phone data I want to join into maybe zip codes or province boundaries or things of that nature. Or maybe operational boundaries for my company or my organization. The output that I want to get is a polygon that represents here's the count of those things. And so we can do this by any combination of space, time, or attribute. These are part of the user-defined functions that we've added to Spark. So that's why it says powered by Esri. So we have temporal operators. So there's 13 or so of these. We took these from the David Luckham book if you guys are into those things. But these are the temporal operators that we've added and we also have a bunch of spatial operators. And you can combine these in any form or fashion that you like to do pretty sophisticated analysis. To expand on the temporal operators a bit more this is kind of a visual depiction of it. So I can ask did this weather event occur? Did it begin a car accident? Or did it intersect or did it coincide with a car accident that happened? And so you can start to really do sophisticated analysis with these things. Another tool that we have is aggregate points. And so this is what a lot of people like to start with just to understand their data. So I have all of this, you know, billions of observations of sensors that are out there. Maybe it's cell phone data, maybe it's car movement. And so what does my data look like? I just want to plot it against zip codes or other boundaries that I have. And so or I could do things like where are the most power outages occurring because of the hurricane. So if I have a feed of that information I can look at that or what zip codes have the highest count of crimes and incidents. And again that could be a real-time fee coming in. So those are some of the examples. To aggregate points we pretty much take point data or observational data. I apologize for why that's flickering, I have no idea. But the aggregate points takes observation data and it joins it into polygons or other features that you have and then it results with a polygon that's densified. So you can see the counts of those things being, you know, red being more incidents or blue being less incidents of that. This can be done on a two-dimensional basis which is kind of described here. If you don't have polygons that you want to join with you can join this into just bins. So I could say aggregate this cell phone movement in this mall to a 10 meter resolution. And so I could say I want 10 meter bins and that's what the top picture is showing and so I can do that in a two-dimensional space. Or what's more interesting is I can actually do this in a three dimensional space where the third dimension is time. And so we call these things space-time cubes. And so you can take this data and add a time dimension to it and say for every hour of the day, so 24 different vertical dimensions, I want to see the distribution of data and how it's occurred over time. So this might be population movement, it might be other things that you want to want to consider. And to to blow your mind a little bit further you could even do this not just into cubes but into volumes. So volumes being polygons or other things. So these could be zip code boundaries or some other kind of shape that you you draw and you want to do some analysis on. So to kind of explain this a little bit more I'm going to show you a quick little video demo of a project that we have. This is data from a business partner of ours called SafeGraph. It's anonymized cell phone data in New York City and what we're looking at here is some analysis that we pre-computed. And so these mobile phone, this is mobile phone activity in Manhattan. And we can slide through time through this little time slider on the bottom here. And you can see the density of things. So we're extruding the values based on the counts of mobile phones that are being counted at that particular time. We can do this by aggregating the data into bins or into triangles or hexagons or even cylinders. And so these are some of the extensions that we've done on top of elastic search to be able to aggregate by other types of shapes. What's another interesting view of this is viewing this in kind of what we call a space-time ripple surface. And so the space-time ripple surface is exactly kind of like what it describes. We're kind of depicting the volume of the data based on where it is. Shoot, I don't know why that's flickering. But as I span through time here, I can see vertically where most of the density is occurring in this. So these, this is just another nice way to visualize this information and see where the distribution of this data was. I could further drill into this and say just show me android users or show me iphone users or show me users that are greater than 30 years old. So I could be interactive with this and change this on the fly. To add that temporal dimension that we were talking about before, we can go into a small little study area here. And so we might want to know over time what was the mobile phone population in the specific area. So as I slide through time here what I'm going to see is vertically I'm seeing the different time dimensions. So these are different hours of the day or different days of the week based on what's occurring. And so we can kind of pan around this area and interact with this. And then we can also change the the visualization into the different shapes that we've indexed inside of our storage. So this is just a nice cool way to visualize the data. It's very interactive. People can do queries against this stuff and do kind of what if scenarios against it. So that one's not open source. Sorry for that. I wish I could give that one away. So I'm going to shift into a slightly different topic now which is deployment portability. And so this deployment portability is this is what a cluster ends up looking like when we deploy this for a user. And so we take our containers and then based on that user's need we deploy a different number of instances of these things out. And so we can we can define you know each of these spark streaming jobs is going to consume two or three cores and it's going to have X amount of RAM. But I want 20 of those to run in this cluster because it's a connected car scenario and there's lots of data feeding in. And I need lots of black sources to sit at the edge and be able to handle the events that are coming through. And then I need lots of storage data nodes so that I can store this data. And then finally I need visualization tasks that are running so that people can query a web app and generate the rendering that I was showing in the the thing that we just looked at a minute ago. And so this is what an end cluster looks like. So this is with our Esri software working. I wanted to give you guys something that you could play around with. So what I did is I created this project called DCOS IoT demo. That's the GitHub link up above. And the pictures that are there or what I could give away is part of open source. And I basically abstracted away all of our Esri proprietary stuff out of this and said what if I just took Scala, Kafka, Spark, Elasticsearch, and Play. I don't have a cool smack acronym. I don't know. Maybe you could come up with an acronym for those things. But this is basically using these underlying technologies without any kind of proprietary Esri stuff in it. Just to give you an example of here's how you can connect all these things together to build an application on top of DCOS. And so I don't have enough time in this session to actually go through the demo to do this. But there's a video and a number of other things up on that repo that you can kind of walk through this. But what I wanted to describe is that when we ship this to a user, our users kind of dictate I want this to run on Amazon or I want this to run on Azure or I want this to run on the GovCloud on a C2S environment for for the government or I want it to run even on premise because I have my own hardware, my own data center to do this. So we don't really have the luxury to say we deliver the software and we deliver it on Amazon. And that's it. That's the only environment we have to support. So one of the key benefits of using DCOS and selecting that is a way to package our software and deliver it to customers is that it makes it so that we can make this portable across multiple different infrastructures. And so as Toby and other folks were kind of pointing out this morning, it's for real. And this DCOS IoT demo kind of walks through the installation steps on how to do that in various different environments. So if you go to that project, what you'll see is eight or so steps on how to get this environment together. And number one is how to deploy this across different environments. So the Azure, Amazon, and on-premise has some pretty decent documentation around it. C2S is coming soon. But that's a very specialized thing. So if you want to talk more about that, let me know. But we have very good documentation up here. It's step by step. It'll probably take you a couple days to walk through this tutorial and stand up the environment for yourself. But it's all available there. It goes through all the steps. And I'm just going to visually kind of walk through what happens. And so to provision this environment to any of these different providers, really the only thing that's cloud specific or provider specific is the first step, which is provisioning your actual resources. So when you go to Azure, Amazon, there's different ways to do that. For Azure, there's a thing called ARM templates. So those are like the cloud formation templates inside Amazon. So you would define an Azure template. We actually created an ARM template. David in the audience, part of my team created this really cool ARM template that'll allow you to just type in, I want three masters, 30 private agents, and three public agents. And boom, you've got a resource that looks like this. If you do it on Amazon, there's cloud formation templates. If you do it on C2S, it's cloud formation templates, but it's an offline system in a disconnected environment. And then if you want to do it on premise, it's basically IT administrators setting up this environment. And so there's a lot of steps to do it on premise. If you want to do that, and we've taken a first attempt at documenting what those are, but the MesoSphere documentation has comprehensive information about that as well. But basically this is the only step that's different depending on the infrastructure. And so once you have this resources in place, and all the prerequisites in place, everything is the same regardless of what infrastructure you're using. So to install DCOS, we took the MesoSphere installation templates and kind of generalized them a bit more. And basically we copy up those these come from that GitHub site. So I'm just copying up my private key and my credentials, and I'm copying up the installation script. And this installation script is part of the project that's the repo that we shared. And so once I have that, I SSH into the boot node. The boot node is one of the administrative nodes that allows me to actually install DCOS and manage the infrastructure. And so once I do that, then I just basically run an installation script and I give it an argument of how many masters I want, how many private agents I want, how many public agents that I want to be in my environment. And when I do that, it asks me what type of DCOS installation do I want? Do you want the latest open source version? Do you want an enterprise version? Which version do you want? And so you can type in a URL if you want a specific enterprise version, and then you put your credentials in, and then voila, it installs this. So to provision the to provision the actual machines on Amazon or Azure, it takes about three to four minutes to spin up 30 nodes, so it doesn't really take too long. And to run this installation script and lay down bare bones DCOS, mesos, marathon, everything else that comes along with it, takes another six or seven minutes. So within 10 minutes, you can have a 30 node DCOS cluster at your disposal to start doing things with. After this happens, it runs through and it tells you what your masters are. And then it tells you some stats about how long it took. And so this one took about seven minutes or so. At that point, we have a DCOS environment that's ready to go and we don't really have any services deployed to that. So what's running is all the agents have mesos loaded on them, all of the masters are ready to go, they're participating in a quorum, and then marathon is available and all the other frameworks that are in the universe are available for use within this environment. And so the next step is part of our demo app that we have is to install Kafka. And so you can select you go to universe and install Kafka and you say, I want to have a five broker system. You specify the cores, the RAM, the et cetera that you want. And then a few seconds, couple minutes later, you have a five broker system in that environment. Similarly, elastic search is part of the elastic package. So you can go to the universe and install elastic and you'll get the whole elk stack. You'll get the log stash, you get Kibana, if you want to install that, you'll get all the other parts of the elk stack as well. But you get data nodes of elastic search and you can specify how many data nodes you want, what size of allocation you want to give to each of those as far as RAM and storage and everything else. And then you can spin that up. So this is, you know, another couple minutes later, you have a 10 node elastic search cluster. So now we're at about 15 minutes. So we're doing pretty good. So if you try to do this by hand, it would take you days probably. And then we can deploy our reactive web app. So once a developer has kind of written this app, it's pretty easy, all the source codes available on that repo and you can deploy this reactive play app out there. And this is the app that's going to allow people to query in and generate the map visualizations that we did before. And then we want to run some Spark streaming jobs. And again, that Spark streaming job could be sized to have as many instances as you want. In this case, it's five instances. And that Spark streaming job is going to be listening to topics on Kafka. And then we finally need something to actually write data into Kafka. And so we have a Kafka producer application, what we call source. And those sources are written in Scala. The source that we give with this repo is just querying a big, I think it's a gig and a half CSV file that's on S3, loads it into memory and just starts pushing taxi cab movement events into the Kafka system. And so it's a fairly simple source. And at that point, we have our application running. And then we can query this through the map interface, hitting the play app. And then we can start to visualize the information. And you get these time sliders. You get all kinds of other things that you can play around with the data. So it's a pretty cool sample app. If you saw the MesosCon keynote last year, we showed a very early form of this back then. We've since updated this to the latest specs. DCS-110 came out last week, so I haven't tested it against that, but I'll do that shortly and make sure it works with 110. And then you should have an environment to play around with and do things with. So the end result is that you get an interface that looks something like this. You can kind of see a time slider at the bottom. You can render things in the geo hash aggregations is what's on the left. And you can kind of see the density of where it's read is where more taxi cabs are moving around at that point in time. And then the yellow is kind of less density. And so you can zoom in, zoom out. It'll change the shapes based on that. And you'll see different levels of aggregation as you zoom in and zoom out. And then you can pan through time as well. And if you want to visualize it as a heat map, what actually happens is the data comes back from Elastic Search into the client. And then the client generates these heat maps on the client side. And these heat maps can be used to visualize the data in a slightly differently. I wish I could give away the cool 3D thing that we showed before. So that's pretty much what I wanted to go through in this session. I want to leave good time for Q&A. So we have about 10 minutes for Q&A. Is this pretty cool stuff? Yeah. All right. What questions do you have? Yes, sir. So the question is about how do we do simulations? That's kind of the concise way, I think, to summarize that. So simulations can be done as part of the batch analysis, the green things that we have there. And so we oftentimes have customers specifically in the military that want to do like, what if we go here and do X? You know, what would happen in that scenario? And so they can run kind of these analytics that we have as capabilities and then get a result from that. And then if they want to play back that data and kind of see it occur over time, they can feed that data back in through this loop. It's not necessarily real time, but it's simulated data. And they can actually control the velocity of that. So play it back 10 times normal speed, play it back, you know, less than normal speed. And so that's typically how folks do that. And simulation is a very big part of what we do as well of what if analysis? What if this circumstance that we actually recorded was different? What if we responded sooner? What might have happened with that? What if the water levels rise X amount in the next X minutes? What will that do to affect things? And that's where you see, like here's the flood plain of New York City or something if the water level rises to this level. So yeah, Esri definitely has tools to help with that. But it's done now. So I have to be real time. It's just a feed of data that you can control back in. Hopefully. If it's not real time, you can go back and inside of our tools and actually rewind it through the time sliders and the other things that we have. But if you want to kind of simulate it as it's happening now and it looks like it's happening now, you would feed it back through this loop. And then you could kind of see it over time. Question in the back. So the question is, do we have any challenges deploying Kafka or other things other than just doing it outside of DCS? No, it's much easier within DCS. So it gives us a very deterministic way to deliver these brokers. And it does that through the scheduling system of DCS and mesos where it'll actually land on nodes that have availability. And so we don't really have to think about this machine is for Kafka or this machine is for Elasticsearch. We can have a general framework and get much better utilization out of the systems. If we want to give hints to the scheduler to say, I actually want this to be a data node and this to be an ingestion node, you can do that as well through DCS and mesos through what's called a placement constraint. So I can say, or if I don't want two Kafka brokers to land on the same machine, I can give it unique host constraints. So only run this on one machine. So you have a lot of flexibility in how you do that. But it's much easier to deploy within DCS than it is to try to do it on your life. So the question is are the extensions we wrote for Spark open source? So they're not. So unfortunately, there is some a couple samples in the repo that talk about how to use our open source. We do have an open source library that's it's called GIS tools for Hadoop, but it also works with Spark. So we have a couple examples. I think there's five or six operators that do geospatial things with Spark. There's no user-defined types. It's only user-defined functions, but that's a good starting point. And there is an app in this repo that uses that to do a geofence detection. So tell me when this enters this area. So if you want to get a basic example of that that you could expand further, you can take a look there. Yes. So the question is have you done anything with auto scaling? So we haven't yet. So the way we deliver these things right now is a managed service. So we deliver it for customer and we kind of manage it for them. And so we monitor it and have humans involved to do scaling and things. It wouldn't be hard to add auto scaling and there's some new capabilities coming in DCOS that'll help with that. But having good instrumentation is super important for all this. So a lot of times when you see talks about DCOS and monitoring, it's all about what's the container CPU and what's the RAM doing. But when you build a higher-order application like this, it's really important to think about what are the application metrics that I want to do. So it's more about how many bytes per second am I streaming through the system or how many how many car observations have I received and I know that I'm supposed to expect 100,000 to 200,000 events per second. And if I get something drastically different from that, I need to sound an alarm and see if somebody can do something about that. Or if I'm supposed to have a response time of my map visualization generation sub-second and I'm not getting that, I need to get those stats in. So there's some really cool DCOS metrics capabilities and DCOS 1.9 and later in those metrics we actually use to add application-level metrics in. So our containers write application metrics into that and then we can hook on to the event bus to receive those around. So we've built our own kind of things in that and you could use those application metrics for example to say if I've got a connected car feed coming in and there's a topic for outies or whatever it is, then if the outie topic is growing and my Spark streaming jobs aren't keeping up, I want to autoscale more instances of the Spark streaming job. So you could have the infrastructure in place to do that. We haven't gone to that level yet but it's certainly possible to do. The question is do you have stats or metrics on how many megabytes or points per core you can do that? I don't think we have it by core but we have it based on we've been able to scale up to millions of events per second on the sources and the Kafka brokers and depending on your analytics that you're performing it'll drastically be different so if you're doing analysis across geofences and there's 800,000 geofences that's going to be different than if there's and those geofences are they have 300 edges on them then that's going to be drastically different than if it's just rectangles and there's 10 of them. So it kind of depends on the analysis that you're doing and it's kind of dependent on the application but we certainly do capacity plan and help customers with their use cases to do that. The geofences also can be dynamic as well so it could be Ricardo's walking around and I want to generate one of our analytics tools can create a service area where the services work and Ricardo walk in one minute and so he gets a polygon around him and I have a polygon around me and I want to know if those two polygons ever intersect so in that case the polygons are drastically changing all the time because every time I move I get a new polygon so with those types of scenarios that you get less events per second because the polygons are constantly moving around and things of that nature so it really depends on the the use case but we've been able to scale this up to millions of events per second and it takes you know tens or so of nodes to do that it took like 10 Kafka brokers to get to millions of events per second yeah yeah it's a great question great comment so the question is what are you doing are you doing stateful processing in your real-time analytics and if you are what are you doing for that so that's a problem we haven't yet solved so we in our traditional product we do stateful streaming analysis so we want to know when Adam enters this room if I need to know when Adam enters this room I need to know his last position and his current position and so being able to do stateful processing is relatively easy to do being able to do reliable stateful processing is very difficult to do and so that's a problem we're still figuring out is how we're going to do that with reliability on that because we need to there's a number of factors like if you do that in a spark streaming job you need to actually have the partitions in Kafka such that you get the same track ID on the same spark streaming job so that I know that and so that's one way to start with that but if that spark streaming job fails you want to be able to have that state on the other spark streaming job that picks up that partition after that so that's a harder problem of how do you actually synchronize that and if you do that how does that affect your throughput so there's a number of factors to consider given that we've prioritized our stateful streaming stuff less and our customers have actually been accepting of that the way that we've kind of compensated for that is our batch analysis capabilities can be scheduled on a regular basis so if they want to do stateful processing they actually do that every 30 seconds every minute run a batch analytic or a spark job and calculate and detect those things and send that out so it's typically acceptable that there's a 30 second latency between those alerts happening so that's how we've compensated and how we've avoided having to solve that problem so it's a great question and a very difficult problem we spent years on yeah yeah so the batch analysis so spark streaming jobs write the data into elastic search and then our batch jobs use metronome and they basically schedule themselves every 30 seconds every minute run detect when atoms entered something and so that way it doesn't really matter which spark streaming job processed it and so that's the reliable way that we can do that and if data comes in late we can compensate for that as well which is another problem so when you're dealing with streaming state in a stream we got time for one last question all right who wants the last question so the question is how do you deal without a sequence things and something goes down so that's another reason why we've kind of compensated with a recurring batch analysis capability and so somebody needs to be thoughtful about how latent can something be and so if it's going to be you know a 50 minute latency potential then you want to run that every hour as opposed to that so you can try to fit it in so there's always going to be circumstances where something comes out of order so maybe somebody turns their phone off and it kind of queues up the events so that when they get connected again it distributes everything and all of those come out of sequence because they get processed in parallel so that's a better way to do it is to actually do recurring analytics through a batch recurring analytic tasks that's running and so metronome or chronos are very good at doing that type of thing all right I think that's all the time we had so all right thanks Adam I'll stick around in the hallway thank you guys thank you guys