 Okay, I'll kick off. So good morning and welcome everybody. Thanks for coming to this first session of the morning after the cocktail reception last night. I appreciate it. I'm Tim Morton. I'm founder and CTO at Acuni. I'm gonna talk to you a little bit this morning about NoSQL and big data analytics. So NoSQL's role in how you analyze data. It's all very well collecting data or using NoSQL solutions to serve it, but where it gets really interesting in my opinion is when you start doing something with that data, when you start synthesizing it, summarizing it or processing it in some way to extract business value. So to give you a couple of sentences about Acuni. So we're a company that provides solutions around real-time analytics. So in particular, streaming analytics. We provide software that allows you to build co-lapped cubes on top of NoSQL. And I'll talk a little bit more about that later, but what I'm really gonna cover in this talk is history and a taxonomy of how analytics arrived in NoSQL. My background, I actually started working with, I guess what you would call NoSQL, during my PhD at Cambridge. So I was part of the project which built the Zen virtualization platform. And as part of that, we looked a lot at how you manage storage in distributed systems and in particular in wide-area distributed systems. So I actually did some work. This was probably back in 2001, 2002, where we were developing the very first generation of wide-area distributed hash tables. And those were the systems that went on to influence Dynamo and Cassandra. And the design of many of the current generation of NoSQL systems that are commercially available today. So I guess the interesting thing there is that back when we were doing that, we weren't really sure what the use cases were. That's research, right? You sort of do it and you justify the use cases as you go. But it's great to see that sort of technology actually having real business application these days. And so I'm gonna take you through a bit of the history about those systems and how in particular analytics has arrived there. This is the first talk in the day of a track of talks around analytics. So hopefully it will do a little to sort of set the stage if you're planning to go to a few of those talks later. So I think in the beginning, as I alluded to from the research angle, NoSQL is mainly about storage. If you look here at the abstract of the Bigtable paper published by Google in 2006, actually the stuff that they were talking about was really how do you collect data, data with some structure to it, and how do you then go and build a system that serves that data back out. So analytics really wasn't the key part of the consideration. The consideration was how do you deal with these volumes of data? How do you deal with these very, very large collections of data? So although the data that we're talking about is data which has some structure to it, this isn't really the domain of conventional databases. In particular, conventional databases have a lot of, offer a lot of features around making guarantees about the transactional integrity of your data. That's great if what you're doing is financial transaction processing. But if you're merely collecting data or serving it back out and not really manipulating it, then those things are unnecessary. So I guess the observation and the key observation really about systems like Bigtable is that what you can do is drop those transactional semantics and get scale. So you move from the world of a scale-up architecture to a scale-out architecture by offering weaker semantics, which is fine for this particular domain. And that also applies to other similar systems that are around at the same time. So for example, Amazon Dynamo, where the sort of canonical use case was around managing shopping carts. And that's quite interesting. So to go into that in a bit more detail, I'll sort of look at one of the examples from the Bigtable paper, which was around offering personalized search. So Google built a system which allowed your search results to be customized through information about clicks that you'd been making on Google, previous searches that you'd done. If you look at what was happening here, they use Bigtable to collect these streams of user queries and clicks. And organize them into a store. So at that point, nothing was being done with that data. It was just sitting there. Then an out-of-band analytics process implemented via MapReduce, which took the underlying storage files that were sitting in Bigtable, stored in GFS, which is the equivalent of HDFS inside Google, were basically taking that raw data, doing some analytics on it, and then pushing it back into a separate table in the same Bigtable cluster to go and serve. So what you've got there is a system which is explicitly saying analytics is not in the domain of NoSQL. Quite an interesting separation there. And why were they doing that? Well, in this particular use case, there isn't particular deep value in the timeliness of answers. But that's obviously not always true for many of the problems that we see as users in the NoSQL domain. So one of the interesting, I guess, distinctions to make, or one of the interesting distinctions that I find is very useful to think about in terms of the problems that you're trying to solve, is what sort of analytics you're really trying to do. So big data in particular is closely associated with Hadoop, being the sort of open source variant of MapReduce. And for those problem domains, you're really talking about collecting a large amount of data and then mining it after the fact to try and draw out the so-called unknown unknowns or the needles in a haystack around the insight that could over the long term inform business decisions that you're making. So it's typically a great setup for doing arbitrary complex analysis on unstructured data. On the other hand, more the preserve of original databases and the sort of everyday line of business and analytical applications that enterprises have been building on SQL databases for years is what really gets called operational intelligence. So here we're talking about how do we do reporting, especially real-time reporting, how do we build dashboards, how do we provide the data that's necessary for applications or humans to be making real-time decisions or raising alerts so that people can act on them, where timeliness matters. So where there is a business decision that could be taken better on the basis of fresh information or where there is something that has gone wrong and you'd like to be able to take corrective action to improve that. And you see what we're really doing here is working more on the left-hand side on data at rest, more on the right-hand side on data in motion. So we're dealing with data that we've collected and perhaps have archived over a while. And on the right-hand side, we're dealing with streams of data. And if you wanna think of it in terms of the Vs, I guess on the left-hand side, you're potentially dealing more with variety. And on the right-hand side, more with velocity. And of course, the key concerns about data volume and the inherent value of the data being able to provide an economic solution for the amount of data that you need in the process are concerns across the board. So I'm gonna really focus in this talk more on the right-hand side because that is where NoSQL has really found a home in the analytics space. So if you look at, I'll go on to show you some of these systems and solutions that are out there. If you look at where NoSQL has got deployed for analytics use cases, it's much more around the operational intelligence side. And that is really because NoSQL, from the beginning, even when it was a storage solution, was looking to tackle the data sets with some structure to them. That structure being the key facet which allows you to go in and do random access on your data and pick out individual results in the same way that traditional databases can do. But I guess as a sort of significant distinction and big departure from traditional databases, we aren't providing, I guess, in the NoSQL world often full normalization of data. So if you think about what happens in a database, you build a relational schema which fully normalizes the data and organizes constraints between various different categories of data that you're working with. So that has some great features. It means that in particular, you can build languages, so languages like SQL, so query processing languages on top which are tied tightly to the storage. So that gives you great flexibility in the questions that you're gonna ask. But it also means that you end up doing a lot of work to answer those questions when you're dealing with very large data sets. So if you consider a setup where you are processing a stream of events, say for example, a stream of tweets, often it's machine generated data or sort of exos data of some sort. For every update, you might only do a few random writes. You're updating data and you're updating indexes, but those might be spread across different machines. There are certainly different places on your disk. The challenge really comes when you're trying to make a query. So if I were to build something like Twitter in a fully normalized relational database, I would be doing a lot of random access and building, doing a lot of work to try and pull back all of that data from across my database as I went. And that is quite a challenging situation if what you're trying to deliver to your users is a low latency experience. So what we found often in the NoSQL world is that we adopt this idea of denormalization. So denormalization is where you think carefully about the way that data is gonna be read back in advance and you write data according to those patterns. And it is something that has been around for a long while in the relational world but has always been frowned upon as we'll come to see. But it goes a little bit like this. So you potentially take an event, you think how is that update going to be accessed later? So my tweets might appear in many, many users' tweet streams. The updates could do potentially many writes, but the advent of new storage technology in the sort of characteristics of disks as a few years ago as they changed means that storage capacity is cheap. Sequential IO is cheap but random access, in particular random reads, are very, very expensive. And to some extent that is certainly still true of SSDs because of their asymmetric properties in terms of doing reads and writes. So what we're really doing is we're transforming a few random writes into many sequential writes which is just better for the world in general. And when you're doing a query, what we'll do is we'll go to one place in the database and pull out the specific timeline for that particular user. The amount of work we have to do at query time is much more manageable. And this is absolutely key as your data set volumes grow. There is no way that you can answer queries in a timely fashion as if it has to touch every element in a potentially multi-terabyte, multi-machine database environment. So the interesting thing about denormalization in that sense was we were basically organizing the data that we collected into a structure which we could return. So if you think about it, we're not really organizing the raw data which is the fundamental building block of the sort of queries. We're actually taking apart the data which is gonna be the part of the response. We're not collecting the input data, we're actually collecting the answers. So one interesting characteristic on which is shared by a number of these emergent NoSQL systems is the notion of distributed counters. And counters are really the building block upon which all of NoSQL analytics is built. And I guess the reason for that is they allow you to almost use NoSQL systems as a storage system for the small components that make up the answers that you want to serve back to your users. So if you think about what would happen here, we would say, okay, I have this tweet, but what I'm looking to do is potentially count impressions on it. I want to be able to see the number of people that received it, I want to look at reach. And what I'm really inserting into an analytics system is not the tweet itself, but just a bunch of plus ones. So say I want to count the total number of tweets or break down the tweets by day. I'm really inserting counts, just simple counts. And that's interesting because when I come back to do a read on this data, I can just pull out a single number. So if you have used Hive over Hadoop and collected tweet stream data and then asked it how many tweets are in your Hadoop cluster, that can take a little while. And the reason for that is that you're not giving Hadoop any context in advance, that is a query that you might like answered. However, when you're building analytical applications, it's very, very common to know the pattern of accesses that you're going to be requiring ahead of time. So I'll come on to show you how Twitter used exactly a system like this to provide advertising analytics. And by thinking, by using the techniques of denormalization, you can provide a better experience for analytical applications to your users. So as I mentioned, these sorts of features are present in a number of different NoSQL databases. So Cassandra's had atomic counters for a while, as is HBase. React has just added them in V1.4, which is I think the latest release. Accumulo has them as well and probably others. And they're a very useful building block for a wide range of analytics. So in fact, the Cassandra implementation of distributed counters was built by the community but with substantial contributions from Twitter, with particular application to one of the solutions they were building around providing analytics to their end users on promoted tweets. So here is a picture of a dashboard, of a promoted tweets dashboard from a talk that Twitter gave a couple of years ago about how they built exactly this sort of system using Cassandra. So you can see that everything here, the key observation is that everything in this analytics application that they have built here is basically a count. So you're looking at the number of tweet impressions, you're looking at the number of clicks, you're breaking that down by date and by user, you're measuring reach. And all of that is just quantitative analytics. Not complex machine learning, but just quantitative analytics. And that is a simple system that was built on top of Cassandra. And likewise, you might have read in the last six months about a system that Facebook has built called Puma, which does a relatively similar thing but this time built on top of HBase. So it's actually used to power Facebook insights. It gives information about how domains and URLs are being talked about on Facebook. And on the right hand side, you can see a picture here of a system called ODS, which is a system called ODS, which is actually an internal metrics system for Facebook. So Facebook use it to track all of the analytics, that are all of the metrics data that are coming out of the infrastructure, the hosts and the services and the applications that are running all across Facebook. And they use them to detect failures, correlated outages and raise alerts on the back of. So you can see a picture of a heat map highlighting various metrics coming back from a particular Java application there. And all of that is built just simply using these atomic counters. So to go back to normalization and denormalization, every single one of these applications denormalization, everybody who's done a CS degree probably has the introduction to database systems by Chris Date book lying dusty in their attic or somewhere on their shelf. So the general opinion and consensus in the database community is that denormalization is a bad thing. You should not do it. But if you look around at the large web organizations, in particular the organizations who've been pioneering systems that have been tackling the challenges of the so-called data deluge, these guys are all denormalization all over the place. They're all using denormalization all over the place. So what gives? I apologize to Eric Evans if he listens to this recording about using this picture. I guess the key thing to observe is that with denormalization you have some challenges but it's literally the only way to tackle data sets where you want timely answers and you're dealing with very, very large volumes of data and you have some idea about the sort of analytics that you would like to look at in advance. So there is a challenge with denormalized systems, however. Remember the key tenet is what you're doing is you're thinking about exactly how your data is gonna be accessed in advance ahead of the time that you put it in. So that's great. So you build something like Rainbird or like Puma and then your boss comes along and says, actually, you know what, the system's great. It'll be really nice to be able to have this different dashboard. And you think, okay, fine. So how do we do that? Well, you're basically back to writing code in order to change your data model at the H base or Cassandra or React layer and looking at exactly what you do with every event in order to maintain the right set of counters so that you can do the right reads. And that's quite painful if you come from a background where you're used to relational database technologies and the richness of queries that SQL bring. I mean, SQL queries may occasionally be slow over large data sets, but at least you can do them. The challenge now is that your boss's new dashboard entails you actually scheduling your development team stopping work and going to schedule that feature and build that. And worse, if your boss actually then says, you know what, I would really like that dashboard but I would really like historical data in that dashboard. Well, we collected the data, even if we collected the data, I'm now gonna have to rearrange the existing data in the system or update the contents of it to be able to provide that new view. And I guess then just to add insult to injury, your boss is almost certainly to say, you know, this system is really great. I'd really love to be able to create dashboards myself. Can it talk SQL? And that's where it gets trickier. So there are big challenges with agility and denormalization. So one other related set of techniques that I guess have been around for a while but that Nathan Mars, who I think is talking at this conference as well, has written a lot about is this so-called Lambda architecture where the idea is you use a batch layer and a speed layer to combine ad hoc results and sort of polish them up with real-time answers. So you can see, I mean, this is an image drawn by a computer science grad student, links at the bottom of just a sort of simple Lambda architecture that he put together for a project. You can see actually that it's pretty complicated. There are lots of moving parts in here. It's not a new idea and it's quite interesting that back in the early 2000s, Google for several months at a time did not update their core search index but actually put the flourishes of real-time answers on top of it by overlaying those answers. And that's exactly what this is suggesting doing. The challenge is that in general, it's very hard to compose a batch systems answers with a real-time systems answers and get something that's meaningful. Say we're trying to count the unique number of visitors, do something very simple like Google Analytics. So my batch system is telling me, well, historically we got 83 unique visitors and my real-time systems counted 17. Does that mean that there are 100? Are they unique? It's not always easy to be able to just simply combine those statistics. So this is an interesting architecture for certain applications, but in general it would be nice to have something which was somewhat simpler. So we as a company have done a lot of work with Cassandra over the prior years and we worked in particular closely with a couple of large telcos collecting call detail and event detail records with a small organization doing visitor analytics and advertising analytics and real-time AB testing and a couple of others. And all of them were really struggling to build these analytical applications on top of Cassandra, but Cassandra's nothing specific here. There could have as well have been HBase or any of these other solutions. And the challenge they were having was really just that this was just a very low-level primitive. They were struggling with the idea of denormalization. They recognized they needed it, but at the same time they wanted the richness of SQL. So we ended up productizing, so that's really the genesis of the Acuna Analytics product, which I'll talk about very briefly. What we try to do is give you the benefits of rich queries, but in the spirit of denormalization. And we've done that by introducing, I guess, real-time OLAP style cubes on top of NoSQL environments, as I mentioned earlier. So the basic idea is that you define an aggregate cube. You give us information about what you care about seeing in advance. So maybe you're looking at a prox top trending hashtags or perhaps you're counting visitors or perhaps you're measuring the rate of change of cool drops or cool length in particular geographic areas, something like that. But you just basically create an aggregate and you give it a list of dimensions. So the dimensions that you're gonna dice and slice it by. And those fundamental two pieces allow us to do a small amount of work as every event comes in to update these cubes. And we do that work incrementally and continuously. So you've got a stream of data and we continuously update those cubes. So perhaps we maintain multiple cubes that are looking at different aggregates and sliced across different dimensions. We also store the raw events for a couple of reasons. So the first one being that we maintain a mapping from those aggregates, the raw events so you can drill back down to the raw events so you can look at the aggregates that contribute, the raw events that contributed to a particular aggregate. But I guess the key thing here is this is denormalization, right? This is giving you exactly what Rainbird or Puma have done. But the point is that we also offer this query language which does as much as possible at query time. So we allow you to sort of compose these building blocks of lookups on these cubes with the sort of traditional or familiar concepts of SQL like joins, havings, order buys, arithmetic operations on the aggregates and so on. So I guess the thing to note here is that you get to maintain pretty rich queries but at the same time those queries don't touch the raw data every time you need an answer. They touch the aggregate cubes which we're building and maintaining in real time. So data coming in typically can feature within query results, within fractions of a second and those queries themselves tend to take milliseconds, tens of milliseconds. And I won't dwell particularly on the further details. One further thing to say is that we have this drilled down but the point is if your boss comes along and asks for a new dashboard, you can take those events and we can programmatically just backfill them across cubes that you're newly creating. So a new cube will always be up to date for data coming in from that point forward and we can backfill that historic data store. So we're just sort of freeing you from writing a low level code that deals with plus ones and building complex data models and managing that in Cassandra or HBase and then also free you from having to write scripts if you need to make those cubes reflect historical results. So I won't talk much more about Acuna Analytics other than to say that we use Cassandra under the hood to store raw events and aggregates. We have integration with Flume and Storm. We also have a bunch of rich dashboards because the next thing of course our customers asked us is that it's all very well being able to get these results but how do I see them? How do I visualize them? And yeah, in fact actually how are we doing for time? So I've actually got a few minutes so I will give you a quick demo in the interlude but before I do that I just want to sort of talk about some of the conclusions here. NoSQL was originally designed as a way of allowing you access to semi-structured data either from a data collection point of view or from a serving point of view. So the problem is when you're trying to build analytical applications on those systems that you can't have unplanned queries that touch all of your data. It just doesn't work. So you've got a few choices in terms of how you architect solutions. You can either do analytics out of band so to go back to the Google personalized search use case that works very well and is a simple architecture if timeliness is unimportant. If you're not really that interested in real-time answers for the particular problem set that you're solving. Alternatively you can use atomic counters and they're a great way of pre-materializing quantitative analytics answers and you get the classic properties of noSQL solutions. There you get high-scale, high-performance and certainly with Cassandra and high availability and maybe with some other noSQL systems too. The real problem is you've got to think carefully about flexibility. You've got to think about how the goalposts are going to move in advance when your organization gets their hands on this great new thing that you've built. The Lambda architecture is an interesting approach if you want to do some really complex analytics. So if your problem isn't amenable to just counting, if you're wanting to do natural language processing or like complex data mining, clustering, recommendation and you need a flourish of real timeliness to those answers, a Lambda type architecture where you use a batch process to build those results from a sort of historical data at rest data store and then elaborate on them with real-time answers is a potentially good way to go. You just need to think carefully about how you can combine in a meaningful fashion the real-time results with the batch results. And finally, if what you're really looking for is an OLAP style cubing engine on top of the noSQL environment, if what you're looking for is sort of SQL-like queries but so nothing significantly more complex than SQL but you do need instant answers then Acuna Analytics could be a useful building block for that. So I'll just flick through to a demo, just to show you how some of these things can go. I guess almost certainly everything's on the wrong display. Okay, I'm just gonna try a bit of monitor gymnastics mid-talk. Hey, good. So this is Acuna Analytics. So I've just got a node setup which I've pre-built a table. A table has a bunch of dimensions which really say when you get data in, treat those particular fields in a particular fashion and we have a set of cubes. So this is actually a sort of mock data set which was built on the back of some work we were doing with the telco as I mentioned in this example looking at cool drop rates across various geographic areas. So we had geo-located cool detail records and they wanted an operational display of how cool drop rates were associated with geographic areas and with cell towers as well. So I just built this schema from this particular file here, this demo schema. So I create this table and add these cubes but actually I can do this automatically. I can, you know, the alternative that you're thinking about here, right, is how do I do this in my favorite NoSQL environment? Well, here you can just take some events, throw them in there and it'll deduce the format of the table for you and it'll give you the option to also insert those events using a backfill process after creating the table. So I won't do that now but I guess what I'll do is see if I can, am I on the right machine? Yep. Okay, so I'll start just inserting some events and yeah, we're getting some, that's good. So then I can go and do stuff like build a cube. So tell you what, I'm just gonna count the number of events, just show you the structure of those events that you basically contain a latitude and a timestamp and a longitude and a duration. So I could do something like I'll group by latitude and longitude. I won't do anything there. Yeah, so the only challenge is that line is not the best fit for that. So what we've got here really is a table where we have latitude and longitude against count and you can see, because we're just still inserting a little bit of data, we haven't got much there. But what you've got under the hood here is a SQL like query which has where clauses and group buys and where you can build much richer SQL like expressions and there's a HTTP API underneath this as well. So you can do anything that you're doing here through an embedded, through your own application. So I could then turn this into say a map and then I can add that to a dashboard. A student observers will have seen that San Antonio. And then I can just get that refreshing in real time as I'm collecting data. And one of the things that you can then also do is I can change that, I can add multiple series but I can go and embed this. So we have a JavaScript library which means you can take that and embed that in your own web application and get from JSON, high velocity stream of JSON events through to displaying a real time geographic heat map in your own application within about, I think it's taken me about three and a half minutes so far. And I mentioned drill down, I can click on one of these squares and see the original events that contributed to that particular aggregate down there. So this is a potentially useful building block if you're thinking about real time, SQL like analytics on no SQL. And if you really care about focusing on the continued, focusing your time and your team's time on the development of your application rather than on building what I would describe as an infrastructure component. Certainly bringing back to you the sort of richness of SQL but allowing you to leverage the scale availability and performance of a no SQL environment. So a couple more things. We talk about cubes and we have some nice Rubik's cubes which are flying away like hotcakes I think on our stall but come along, we're in booth six and take a look. You can also, one of our engineers is also quite good at these Rubik's cubes. So if you can guess the amount of time it took them to solve one, you can win an iPad mini just enter your name into the draw. And I think actually one of our customers, Halo is also talking a bit later on. I think it's at 10.30 but I've lost the details. Could be 10 o'clock, 10.30. So talking at 10.30, it's a Halo or an Excel funded mobile taxi app. They use a whole range of really interesting technologies including Cassandra, NSQ and Acunu Analytics and they're gonna be talking about how they built their architecture for scale because they're growing very, very rapidly. And Acunu Analytics is a big part of that in terms of monitoring both from the infrastructure side, all of the metrics that are coming in in terms of very low level system components right up through to business KPIs. And they'll talk a little bit more about that. Thank you. So I think we've got a few minutes for questions. Hi, sorry actually just a second. Have we got a microphone that we're bringing around or? Okay, I'll just repeat your question back. How far do NSQ databases go in supporting self-service analytics? So in terms of BI tools, you mean? Yeah, I mean that's certainly a big area that lacks at the moment in this world. I mean one of the breaks, one of the I guess consequences of moving away from a single standardized, well-supported language like SQL to more domain specific interfaces which on the one hand allow you to sort of break the transaction assumptions of SQL and get scale but on the other hand mean that you don't have SQL anymore is that none of the tools and the visualization tools, the BI tools, the ETL tools, the integrations with data warehousing, all of those things that you were used to just no longer exist. So from a SQL point of view, from a no SQL environment point of view there are relatively few. Not many BI tools can directly access data in no SQL environments yet. So that would be I guess the sort of self-service side of things is worth saying that there are other systems which do support a more relational database style of normalized queries. So MongoDB for example has an interesting aggregation framework which works in a similar way or takes a similar approach to a traditional relational database in that it gives you a rich set of queries on semi-structured data and builds indexes and just without you having to predict or know anything in advance about the queries that you're gonna ask will allow you to ask relatively rich queries over that data. However that obviously comes at a cost in the sense that aggregation queries in Mongo are typically broadcast to every node in the cluster and not always but mostly and will end up having to touch a lot of original data to be able to get answers. Unfortunately there's no free lunch in that sense. Thanks. Hi. No, so every event that comes in in a CUNU analytics we what we do is we look at the event, we look at the cubes that we have and we make a number of rights into Cassandra to maintain our material, essentially our materialized denormalized views. Yeah, exactly. So what we are is basically a layer over this distributed counter system. So you can think of us as if you come from a sort of Cassandra or H base world, you can think of us as basically handling all of that data modeling for you. So you just say what high level shapes of queries you're interested in and we'll do all the data modeling. We'll build the queries, we'll build the structure of the data that we maintain in Cassandra and we'll access it directly. So CUNU analytics is really just a mapping from events into rights into Cassandra and from queries into reads on Cassandra. But from that we can compose up a pretty rich, a rich set of queries that you have. But it's, you can use the standard operations tools to back up and restore for example. Thanks. Hi. Do you mean in a CUNU analytics specifically or in any of these systems? Okay, so we support, sorry, the question was, thank you, yes. The question was, how is security addressed in a CUNU analytics? So we have, so Cassandra itself has a number of ways of securing the encryption, so securing the communication channels between Cassandra nodes and between Cassandra and its clients. Analytics is essentially a Cassandra client. And you can go further with, so data stacks enterprise, which is a great distribution of Cassandra has encryption or actually supports on disk encryption as well. A CUNU analytics supports, facilitates you having client side authentication. And in the upcoming release in a few weeks time, we will be adding role-based access control on a per table basis for a CUNU analytics. So the step after that on our security roadmap is to actually allow you to have access control on a per-wear clause granularity. So you can set up individual users with access to specific values, individual values in a wear clause. So that's great if you have a bunch, you're storing a single, you have 100,000 customers and you've got a mapping from who's allowed to see which customer's data and you want to be able to store all of that in a single cluster. Thanks. Hi. Yes, yes. Good question. So how does the Lambda architecture relate to a CUNU analytics? So the Lambda architecture from the picture, the diagram that you can see up here consists of really a batch layer and a speed layer. So the batch layer is the line at the top around Hadoop and batch views and then the speed layer is the piece at the bottom where you take a stream of data into something like Storm or S4 and then build real-time views and then when a query comes in you sort of combine the two. So a CUNU analytics is basically just a richer speed layer. So a CUNU analytics would, you would strip out the whole thing at the top. You would have a CUNU analytics being the speed layer. Basically the first step is a CUNU analytics. Then the real-time views are maintained in Cassandra. So they're persisted. So you've got them despite node or data center failure and so on and you can also got continuous access to historic results. So I guess that's one difference is that when you come in and ask a query there's no difference in a CUNU analytics between data that arrived only a few milliseconds ago with data that arrived weeks ago or months ago. So very often our customers are doing comparisons with historic baselines. Has the investment banking, has the volume of trades, has the volume of trades that we're seeing right now reduced or increased by more than sort of two standard deviations over what the particular volume is usually for this symbol on a Tuesday morning at this time. So that sort of historic baseline is used to be able to understand whether there's an outlier or anomaly. So yeah, we're basically just the bottom line. Yes. Yeah, sorry I missed. I didn't go into too much detail on that slide. So CUNU is basically three layers. So we have a Cassandra cluster. Cassandra is a great solution for scaling out. It's a multi-master architecture. A single cluster can span multiple data centers. CUNU analytics itself is scalable. It's more like a Cassandra client. So it's on top. And then you can just run the dashboard to all client side. So just run in a browser. And I think we are out of time. So if you've got any more questions, please just stop by our booth. Thanks.