 And welcome, my name is Shannon Kemp and I'm the Executive Editor of DataVersity. We would like to thank you for joining today's DataVersity webinar in memory computing, Miss In Facts, sponsored by Greg Ging. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be meeting during the webinar. For questions, we'll be collecting them via the Q&A section in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVersity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. I'm proud to introduce to you our speaker for today, John Webster, Greg Ging's Vice President and Product Management. John brings significant expertise to Greg Ging's systems with over 15 years of senior and global technology and sales history with notable global companies including NetTuitive, Microsoft, Amber Point, acquired by BEA in Oracle, and Silverstream Software, acquired by Novell. John has also held a number of technical roles with an early adopter of Java for use in an online gaming startup he co-founded in 1996, while also developing his expertise on J2EE and Java application servers. John holds a BS in Computer Science from the University of Western Ontario, Canada. And with that, I will give the floor to John to start the presentation. Hello and welcome, John. John, thank you so much for the kind words introduction and just happy to be here today. Thank you, everyone, for joining today. Hopefully, I can cover some of the common myths and facts of in-memory computing and dispel, some of the stuff that I see across the industry as sort of common misconceptions, if you will. As Shane mentioned, I'm Vice President of Product Management here at GritGain Systems. We are a leading provider of in-memory computing platforms and technologies, and I'm happy to be sharing some of our experience with everyone today. Even though everyone is muted, I do enjoy questions and I'm happy to answer them as we go and, of course, along the way. So if you think of something, make sure you queue it up and put it into the chat section and we will address that as we go here. So that thought would be useful is to give you a definition, or at least our definition, my definition of flight and memory computing actually is obvious to everyone in the industry that in-memory computing has come to the forefront as it were with major vendors like FAD, Oracle's recent announcements, many other startups here in the valley and other places doing a lot of work in memory, and, of course, it seems sort of intuitive that if I'm talking about in-memory then I obviously must be faster. So our formal definition here is that we, you know, views is using high-performance integrated and distributed memory systems to view the compute, process, analyze, and then transact on large-scale data sets in real-time, and I don't want to digress into a discussion of, you know, what deterministic real-time is as it compares to the high-frequency traders in New York, for instance, but think of real-time as essentially application time or millisecond to nanosecond. That is sort of the SLA that we approach in, I think, in-memory computing. Ideally suited for, but often you can't approach without using some type of in-memory computing platform, and it's really very much orders of magnitude faster than legacy-based systems. A couple of things I'd like to point out here. This is not about partial caching of data. It's not even about simply having a database residing in memory, although that certainly does get us closer to an in-memory computing platform, and it certainly is a necessary component of any in-memory computing that you're going to be doing. The point with in-memory computing for me is that the primary data set is in memory. We design our product with a memory-first architecture, and we'll say that what I mean is that exactly that, is that I assume that my data set, my entire data set, is residing in system RAM. And it's important we'll come back to that in a second as compared to a whole lot of the flash technologies that you're seeing here. This really is not just about being faster. Part of caching, caching technologies, that's about being faster. That's about having faster access to disk-based technologies or making up for deficiencies in disk-based things. And what I see in my customer base is that in-memory is very much transformational. These are new types of workloads. They're new types of problems that we're approaching. They're problems that have presented themselves in the last 18 to 24 months as things like sensors and streaming data have come more and more into all of our environments. Now, the other thing that you are seeing, and we are certainly seeing, is that the time is right, if you will, for in-memory computing. The time is right for caching and change. And the reason for that is, here's a truism that we all know, data is exploding, but also data types are exploding. Event data, clickstream data, all sorts of the exhaust, if you will, from all these network events and everything that all of these systems and our lives become more and more instrumented, are creating. And the caching memory is dropping here over here. And it's really accelerating the adoption of flash-based technologies and flash in-memory and things like that, which we'll talk about in a second. And really, study after study from the analyst firms, and certainly substantiated by our customer base, is that one of the top corporate imperatives, some kind of in-memory computing system, in-memory database, and certainly we've heard a lot about that at the Tihana, for instance, is very prevalent, we're seeing that. It's one of the top corporate imperatives that we're seeing across the industry and across certainly our customer base. I'd like to point out a couple of things to really just kind of highlight how profound I think this change is actually going to be for all of us in the industry. In fact, in the 70s, those of you that are on the line that we're still working or working in the 70s, certainly I'll give those weren't a bit of a history lesson here, I've been released the Manchester Hard Drive. And that's what, 43 years ago, that's not that long ago. And it really ushered in the era of hard disks. And if you look at what happened in terms of storage in hard disks, effectively, a gigabyte of storage is almost free at this point from any sort of corporate standpoint. Certainly, and I will bet everybody on the phone, has got a terabyte or two in random drives, external drives that they have sitting around in their basements or offices. So that's why there's this introduction of hard disks. Obviously, they're very expensive at the beginning. What you saw was the decline in the utilization of tape, which was really the de facto thing that everyone was using. You also saw the introduction of SQL. It ushered in the era of structured data. And of course, all of the things that have followed on from that. So what you've got here in the 2010s as it were, you've got 64-bit CPUs and D-Risks that are dropping 30% year over year. And what I am, my attention is that it certainly has ushered in the era of memory. And accordingly, I expect you to see hard disks start to decline. Now I'm caveat this for a second. I'm not saying that hard drives won't exist anymore. I'm not saying that there are going to be plenty of storage and stuff like that. And all of the things that we need. But what I am saying, and my partner agrees with me, is that RAM is really the new disk. And disk is really going to be commonly the new tape where you put things for sort of long-term durable storage and the disaster when you need to turn the lights on and put all those things back together again. So you are seeing things like NoSQL, NewSQL, and all of that coming onto there. And you're really about to see the era of unstructured data coming here. And again, SQL is not going anywhere. SQL will certainly be with us. My overall point that I'd like you to take on is very much a profound change that is on the level of the web. It's on the level of cloud and the impact on our industry and the technologies and the way that we build applications is going to be correspondingly as profound. So that's what I'm talking about in terms of memory-first versus disk-first. Well, really, this is about primary storage. I mentioned that already. And essentially, disks are for backups. Again, disks are virtually free. We're actually seeing a decline in the use of disks for our high performance. I mean, and if you think about it, the latency of accessing data from memory is in a few seconds. And you're really just doing pointer arithmetic. It's an API call to get your data. When you're using disk in a disk-first architecture, which is what you see in most traditional rdbms systems and most of the things that we've built out over the years, it's really an API call which is going to engender some sort of level of OS call. You're going to go through the IO stack. You're going to hit some hardware. You're going to be on a different bus. And then ultimately, you're going to pick things off of either a spinning disk or a solid state drive, which of course is faster than spinning disks. But nonetheless, you've still got latency in milliseconds, if not worse than that depending on what you're retrieving. And an important thing I always like to say is just because you can scale doesn't mean that you're in memory. Just because you've got a cache, for instance, or just because you've increased performance, it's really not synonymous with the same thing as it were in memory. One thing that's also different from an in-memory computing technology stack standpoint is that it inverts what you might call the normal or the standard application architecture. And by that I mean if you're looking at anything like a client server even to J2E and various other things, your data is always moving around. You've got data writing in some sort of repository, database, system, flat file, whatever it might be. In order to process or do something, you're either looking into that database, taking the data out, you're processing it on your application tier, maybe you're putting results sets back, maybe you're updating data rather, you're returning results sets, whatever it might be. The point is that things are moved. They're not partitioned. They're in a centralized database. And again, that's not to say that that database isn't clustered. Obviously it is. We know that that's a very standard thing to do. But it's actually impossible. You can't actually send the applications to the data in the way that you can with in-memory computing. Now, let's talk a little bit about Hadoop because Hadoop is a great example of exactly what I'm talking about. One of the advantages of Hadoop and in my opinion the reason that it is as popular as it is among the cost and durability and sort of cheapness of it as it were is that what Hadoop has done for us is it has essentially democratized parallel computing. Doing parallel computing and the folks at this company have been doing parallel computing for decades in some cases. But traditionally it's been very challenging or difficult and hard and sort of limited to some very specialized use cases before you're actually doing through parallel processing. Now, in terms of Hadoop and MapReduce, MapReduce being a very common and sort of easily understood notion in terms of parallel processing. Hadoop does this over-spinning. It takes things from days to hours, hours to minutes. Never designed for real-time workloads, but very much about bringing the computation to the data. That is in fact what MapReduce does. Now, that's not the only way to do parallel processing. And within memory computing, in that same notion you've got a partition data set, do data residing in memory, and then an ability to compute or transact on that data. And we certainly do MapReduce. We do MPP. We do a variety of different kinds of parallel execution on our front for top of these things. And the difference is that by doing this, by sending the computer to the data, you can actually get what you might call almost unlimited scale. I've never been able to speak from home product for a moment. I've never been able to not scale. I've never reached a theoretical maximum in terms of the scalability. And that's because, again, memory-first architecture that we've actually put together and be able to do that. And you can very much actually approach the theoretical minimum in terms of scalability to scale and access data. I'll share just an anecdote with you. One of the things that we always say at Greek game, certainly, is that the first law of distributed programming is to not distribute. And MapReduce is really very much one of the first paradigms to do it. So my overall point on this is that data is not enough. It's not sufficient. And in-memory database, in and of itself, is not if you absolutely need compute component as well. So let's talk about the first thing that I hear always sounds great. I want to be able to do things differently. I want to be able to have virtually unlimited scale. I want to be able to do it on commodity hardware. I want to be able to scale up, scale out. Whatever I want to do sounds great. But this is just so expensive. And that is certainly a legacy of where we were 24 to 36 months ago in terms of the brand price, in terms of what your cost usually is these kinds of things. Now what I can tell you is, and I can tell you for a fact, this is actually just wrote a check for this for one of our engineers. Very simply, it is a territory of DRAM cluster. It's about 10 blades. And we were able to buy this for about 25K, including shipping and racking and storing and putting this into our data center as it was. So very, very cost effective. So you can scale up 10 terabytes as a quarter million dollars. That's not really too much. You can absolutely approach this in very, very variable fashion. And it's interesting that it wasn't that long ago where a terabyte was sort of the purview of giant resource organizations. But you can certainly have this in your data center very, very simply. By 2015, we expect this price to drop even further. So the hardware enablement on this, we expect to see a terabyte of hardware on commodity blades around 10,000 bucks. Again, do the math on that in terms of your data set and where you actually want to do that. Traditional types of technologies, and even the big data technologies, can't deal with the latencies associated with this, making this very much about in-memory by very much about DRAM. And then very interesting thing that's actually coming along here now that we're seeing a lot of is memory channel storage. This is not PCIe-based stuff. This isn't, you know, flash SSDs or solid-state drives. What I'm talking about is actually flashed in DRAM form factor. It's about twice as fast as flash, and it's pretty much the same price point as it is. And what that allows me to do is have randomly a set of memory available to me and some follow-up consequences in terms of price, making this even more formal in that flash itself, obviously is a cheaper technology. You don't have to spend as much money for it. It also steps forward when compared to DRAM, and you've got a very interesting thing where you can see a lot of very cost-effective things coming into the market where power cooling, you know, I've seen things like 512 cores and 44 terabytes of flash-based memory in 2VU. So again, think of your data centers and think of how quickly and easily you can scale to some very substantial levels here. So point being, you know, too expensive is just simply not any more for the current state-of-the-art as it were. Now, the second thing I always invariably hear, particularly from my friends who are in the RDMS world, is it's like, hey, you know, in memory, how does this sound great? But it's not durable. It's just, you know, it's just cash. Stuff like that. You're going to get stuff. The power is out. And this is simply not true. All mature IMC platforms have durable backups and disk-based storage. You can do very interesting things, active or passive replicas. You can architect your system. You can be able to default-pollin and fill over inside of a data center, inside of two data centers, such that you don't lose any data, configure any number of backups. In memory, you can run things transactionally. You can read and make through to a variety of underlying technologies. RDMS, HDFS, local soft space. So, mature IMCs always provide two level of tiered storage. So, DRAM first, certainly local SWAP. Although I'd argue if you're spending too much time in SWAP, you probably don't have your system architected completely properly, or maybe not sized properly. But nonetheless, that capability is there. And then, of course, obviously, the RDMS, HDFS systems. Now, here's the interesting thing that we see. So, data. And certainly, our example of really big data. But what's interesting is that about 89% of operational data sets are 10 terabytes or less. 10 terabytes is a very approachable number. I've got multiple clients saying anywhere from 10 to 20 terabytes in memory across the relatively modest number of servers, quite frankly. And it is, you know, when you get right down to it, it's very approachable in terms of having your complete data in memory. So, the interesting thing there, not durable, is simply not true. There's very much durability here. And a little bit of a forward-looking statement. There's some very interesting stuff coming up with non-volatile hash that is certainly going to change the hardware landscape on this substantially. So, lots of interesting things here. Number two, definitely, actually, needs to be to bunk. Now, often I will hear that flash, meaning flash-based approaches to flash is fast enough. And, for example, on PCIe is, well, quite frankly, still a block device. And what that means is it's still in disk drive. It's a 3.1. For sure, I will not argue that. It's very much faster than the hard disk that it is replacing, and there are lots of advantages of that. But the point is that even with flash that you still are going through, your iOS or OSIO, you're going through your iOS control, you're on a different bus, you're all this Marshallian buffering and various things that need to be done. And if you're writing data that's smaller than the block size, there's just tremendous inherent inefficiency in all of this. And if you really want to go fast and coupled with the fact that the price is dropping and operational data sets are really 10 terabytes or less when you get right down to it, given that it is very much the answer. And it's just substantially faster in terms of iOS and access. And that, again, goes back to the hardware that underlies my software stack. I spent a lot of time talking about software hardware because it's an enablement for my software stack there. So, again, some things to think about there when you're looking in evaluating flash-based technologies is where is the data being written? Is it being written to a blockchain device? Is it in and out of the IOS? Now, I mentioned earlier when I sort of kicked this whole thing off that we started to talk about in-memory databases are sort of synonymous right now with in-memory computing. And in memory computing and the whole ocean of that and the platforms and technologies, it's not really a product. It's a technology and architecture. It's really very much applied to different products, different payloads, and different ways. And you need to be able to do a variety of things together. So, simple caching is simply not enough. Caching is great. It makes up for the deficiencies in disk drives or a lot of great caching technologies out there. But don't confuse that. You should be considering as a sort of tool or full stack or full service in-memory computing class. So, in-memory databases is not enough. You will certainly get a performance gain there. I'm not saying anything otherwise. But it is very important for today, but it's easy to adopt because we're all familiar with in-memory databases. But again, if you're going to talk about you have to be able to handle the ingestion of data, the processing of data. And with databases, certainly you get queries, which again leads to analytics, but it doesn't do any sort of substantial processing across this. So, the point is there. Great use case for today, but not sufficient for the entire platform. Now, the other thing that's happening right here is that streaming data, frankly, streaming data in many respects can only be supported on in-memory computing platforms. Such as have linear scalability, the ability to be able to scale this out, be able to store data, which is why streaming-only technologies frameworks, they're simply not sufficient. Your reality is you're going to either be integrating two to five different technologies depending on what you want to do for in-memory computing, or you're looking at a vendor-like break-in game that provides the entire full stack here. Very steep learning curve with that around the integration points. Where we see the market going is that vertically focused, almost specific indoor plug-and-play products that are very much the future. So, essentially, plug-in, accelerate technologies like Hadoop provide the maximum benefit with the minimum of integration. So, I'm going to pause there and maybe make a call out for any questions that you might have. I certainly don't see anything in the Q&A section. I'm happy to do that and hopefully try and get things primed for the Q&A on towards the end of the session. Now, before I finish up, I wanted to give you a couple of representative use cases. Obviously, these are driven from our database, but I think they highlight some very interesting things about the difference of what I'm talking about. So, let's just talk a little bit about the first use case. It's a financial services use case designed in real-time risk analytics. So, in this specific case, it was a hedge fund with a mid-size book and about 100,000 options. And then, in particular, it changes the risk profile, and essentially how, whether I'm going to make money or lose money on these kinds of things. Now, what's interesting is for everybody to take every option, it's really the only total for these 1000 options about a gigabyte of data. It doesn't sound like anything approaching big data, and probably doesn't even sound frankly like any memory computing problem, but here's the penalty on this. So, the point with this is that as you start to scale this, if you walk through a bigger-sized book, that obviously increases the amount of data that you have to fetch on each time. They started with a traditional architecture, an ABMS system underlying what essentially was a client application, this Java base, and they were doing queries and pulling that data out. Now, the problem is, is they quickly saturated the network when they wanted to double the size of their book, coded from 1,000 to 2,000. They were doing some deeper-type analytics on this as well, which increased the computational component. Something that they tried was they actually moved to a cache, just an ordinary cache. They said, well, okay, let's cache this data on the nodes and we'll bring that up. Well, that's fine, but the reality of that, even with the caching, is that they were quickly saturated the network again because they actually had to be moving data around and quickly scaled. Now, let's look at it from an in-memory computing standpoint, and I will really admit that this is actually biased because it's from my perspective and my product, but nonetheless, I think that's representative of what we think in in-memory computing platforms. I'll tell you why and how they chose us and what their advantage to them was. So, with Grid Game, you have, again, remember the ability to compute. You have the ability to store. So, what they actually did is put together a comparatively small grid, quite frankly. I think they started with about 40 servers or so, 40 nodes, and they put the data part of it across those nodes. We did check up. We did one backup to make sure there was some fault tolerance and resiliency there. This data could be recreated, so that was sufficient for what they wanted to be able to do. The ability to compute there and the ability now, rather than moving data around with every market take for every physician that they have, we actually now have the ability to do and then the computational unit, as we're out to, the need for the data resides, so things are partitions, out to the node where the data resides. Now, what's the advantage of that? Well, every execute locally, it's all in process on one machine being accessed from RAM. I have very little data that's moving around the network, and what's very interesting is they've scaled their book. They went just under 4,000 different components that they're holding in the market, so that's not just an option. There were eight and three great games on a quarter of the book. Quarter of the size of the book was about 1.25 seconds or so, and afterwards, they had to look at about half a second and added more flexibility to their risk analytics model to see if they can scale as they need to. By the way, and actually, this is a guy I used, I quote Austin, he says, I love you because I just stack him and wrap it back up. We ran it coming back to Inwormer Computing. You have compute embedded with the data. You have data partition, and you have the ability to scale both limitlessly as it were by adding additional nodes. So a very easy, interesting use case and how that actually was working for them and their advantage to be able to do it, and they will be the first to tell you that they could not do this in the absence of having a technology like pre-keying in order to do, to try and provide the embedded memory computing platform. Let's talk about logistic use, which is actually quite interesting to me when I was working with these folks maybe it's my romantic notion as it were of gains and roll down the tracks from my days of my childhood, but modern automotive is a pair of bytes of data. There's some tree data, you know, I asked where am I, what am I doing, and there's centered data across the entire engine. All of the componentry, fluid levels, all of the different things that make up a very complex engine or monitor, and they spend a lot of terabytes of data and they stream all of this data back. Now as you can imagine, if you end hundreds, thousands of these locomotives, it becomes a skitchy very, very quickly, and what the customer of mine found is that they simply could not ingest this stuff and run it out to traditional disk-based technologies fast enough, it just simply could happen. It's very much like staying in front of a fire hose, if you will, and what's interesting is it extremely poses an interesting challenge for all of us as technologists. It never stops, there's no beginning, there's no middle, there's no outline. Suppose there's a beginning, you first turn the system on, but the point is there's no catch-up time. The data is always on and it's this sort of giant fire hose that's blazing at you without actually stopping. And you only have two options to catch up, you drop data or you have some sort of scalable platform that allows you to be able to do that. Now, data and making sense of this data is one thing and it's actually quite interesting. It is one of the problems to solve, but then you add on an additional computational layer of complexity as it were and what happens if I have a switch that's broken? Well, now just throw out what tries to get right around this in what order so that I can meet my SLAs on cargo delivery and you can optimize this for minimizing financial penalties, maximizing my savings on fuel. There's always these different factors that go into this, so it's a complex computational model that you've got running, you've got data streaming matches that you need to react in real time, and quite frankly, figuring this out overnight or tomorrow doesn't actually help you very much. Now, this is actually a net new sort of technology and problems for my customer because, you know, five years ago, four years ago, none of the locomotives had this level of sensors. It just simply wasn't the data volume, it just simply wasn't there. And you can't do this, you can't scale in just this data scale and compute across this. You're shipping data around to the processing nodes that are processing later or essentially just doing the processing. So you have to embed the compute in place with the data and do in place analytics on the data. So net new, I hope you're noticing a bit of a pattern here, which I'll point out. I see a congestion of data, a computational model, and the ability to store a level of data across Armory Data Space, at least two, three of those in almost all used cases. And it's the way that you can evaluate your own used cases and change to memory computing and what it can bring to you in terms of benefits. So communication is a very interesting one as well. I had no idea, or I guess I never thought about this before I got involved with this customer as well, but it is very, very expensive to turn on a new turbine or turn on an entire power plant. And at a time that a generating company reaches its capacity, there's essentially delivering all the power that they can without flipping a switch and turning something along. It's a very complex decision that actually drives that. So you'll have companies that essentially are vertically integrated. They're selling to the market, both commercial and residential. They are trading energy on the spot market. They have generation, they have co-gen capability to be able to deliver power to the grid to meet this demand. And what's fascinating is that when they're making this decision, there are, in this case, there are 28 different factors that they consider just environmental factors. What the data is, how much gas is it, how is this compared to historical norms, what is the temperature, what is the forecasted temperature in the next 30 minutes, next hour, next 24 hours, and so on. And what you've got here then is all the data streaming into us coming from sensors, coming from the smart meters, coming from all my environmental sensors, coming from my power plants. You've got to ingest all of this data and make sense of it. Put it in some kind of representation that you can then do analytics across that. You have historical models for what happens when it's 46 degrees and it shouldn't be and it's outside of the historical norm. And if you run a complex analytics process, you keep a subset of your operational data in memory along with your historical data and you should be able to scale this up in order to make that decision. Do I turn on a plant? Do I project it on the spot market? Or do I do nothing? Because I don't think I can make money if I do any other moves there. So it's very much fascinating, fascinating use case. Again, sort of in that new analytics, they were doing bits and parts and pieces of these, but we've got smart meters and better sense and better instrumentation rather on all of their equipment. It's a fascinating problem. And again, that streaming plus data plus compute is a very much ideal use case across that. And finally, a case of many others that we could talk about is really about, well, it's an interesting one about oil and gas drilling. So apparently there's who knew. There are seismic data centers across all fracking fields. And after you hear this, it sort of makes sense because these seismic monitors pick up all sorts of data that they use to determine whether or not they're going well at the drill bit, at the head down. And the reality is that the drill bit itself is centered and sends data back. And it can be, in theory, are supposed to combine all of this information to ensure that there's not some sort of environmental impact or issue with what they're doing. And this technology is not approachable by anything other than any memory computing platform because you need to be able to linearly scale this. You can have streaming data. You have a computational model, in this case analyzing all of the seismic data. And you need to be able to keep some of this information available to you for seismic data available to you. In memory for a certain period of time. So with that, I've actually come to the end of the slide that I wanted to cover today to talk to you about this type of technology. I am happy to do Q&A here insofar as we've got questions to answer. That would be great. I typically see a whole lot of questions raised after these presentations. So we've got a good question coming up here with respect to, is this a commodity scale out of technology or is it a scale-up technology and to replace some of the big heavy-duty appliances that I might be using? Speaking from Greta Gaines' perspective, I am actually happy being a scale-out technology along with the scale-up technology. You can do any combination of both. I've got folks that scale up and out. I've got folks that just scale out on commodity. And the interesting thing about it when it's compared to some of the sort of dedicated appliances, the dedicated appliances are great. Insofar as you fit within their performance characteristics and when you bump into the top of that, it either gets extraordinarily expensive or it actually becomes technically impossible or improbable to be able to sell that up with an in-memory computing platform which is, remember, distributed from the get-go. You can actually just continue to scale, as I say, in an almost unlimited processing fashion. So you can hear the straight input data compute. How is the scale of the data storage and parallel processing done? Well, in our case, that was obviously done with Greta Gaines and we actually come from high-performance computing roots. The longest problem, we just gave myself up as a Native Canadian to have to set that. But the HDNation of this, so essentially you have to be able to do compute. Now they're not doing math-reduced per se. They were actually just doing atomic task execution across their data set. And the data storage, it was run across a set of commodity servers. They had about 150 servers for their first go round of this. And they were relatively average profiles. I can't recall specifically with respect to the actual overall memory component on this, but it was in the 10s of terabytes. They had 10s of terabytes on that. So as I say, about 100 names or so for the first round here. If you have any other questions, now would be a very good time to actually ask those via the chat capability. Hi, let me just jump in here really quick. I know one of the most common questions that we always get is if people are going to get a copy of the presentation and the recording and just a reminder that we will send out a copy of both within the next two business days, along with anything else requested throughout the webinar. And yeah, please, I always brag into John about how active you guys are, so please definitely jump in and add your questions there. In the Q&A, we'll see a couple more coming in. John, if you want to get those going, if you want me to share the question. Yeah, absolutely. So we've got a question about data modeling and applicable to this technology. And I mean, I think certainly data modeling is applicable to it. It depends on sort of what you mean if you want to sort of clarify that question for me. Certainly, with respect to data modeling itself, I'd be happy to, but to us, the very simplest distillation of what technology is under the covers for us, we are a key value store under the covers. The object representation of your data in some fashion. Now, I'll also give you a JDBC-based view of the world, meaning a table-based view of the world, same thing under the covers key value, but you can expose it as a table-based and you can issue straight-up SQL queries for us as well. Actually, I'm going to be implementing or releasing rather the beginning of next year a document, a straight-up document API. We're going to be introducing a whole API. You can get the data programmatically, simplest way, puts and gets, obviously, object representation there. We give you a wide variety of different ways that you can actually do that. So, typically, for us, what you would do is that you have object representation of your data that you would be there, and you can obviously query. So, that depends on doing your data modeling. Of course, that would be applicable to the object we support, you know, standard OR mapper, you know, hybrid overnight, things like that as well. Okay. I had another question come in, John. What's the high-level architecture and flow of great game? Well, for us, we actually, you know, we have the luxury, I guess, or the foresight. I'm not sure which to say, Shannon, but we have the, when we took a Greenfield approach to our design many years ago now, we actually consciously made the choice to make our deployment architecture as simple as possible. So, we have a very lightweight node deployment, the notion of a node. And, obviously, at startup, I guess I'll say, every node is equal. That's not to say that some nodes don't become data nodes, you know, don't become specific, you know, sorry, we don't have logical caches across physical nodes. The point really being is that it's simple. Very simple to start, very simple to operate, and essentially anywhere you can run a JVM, you can run a great game node, which will then take sort of the maximum advantage of the underlying hardware that it's running on. So, I'm not going to say that this is a practical use of great game, but we do amuse ourselves every now and again by carrying out great game nodes on an Android handset. And the reason I say that is to really illustrate how lightweight and straightforward this deployment architecture is. But at a high level in terms of getting data in and out, if you can do put and get to certainly, if you try to use a hash map, you certainly can certainly get rolling with great game. And in terms of the task execution, it's a standard, you know, I'll say it's probably one of the most familiar ones, of course, because of what Hadoop has done in the industry for us. But it's a very straight up, if you are a, you know, a standard enterprise Java programmer, you will have no issue working with great game. And what we've got ourselves on is that a great game application, once you've written your application to work with great game, you run it on two nodes, 20 nodes, 2,000 nodes, and the application knows no difference except for the fact that it has more processing capability and data storage available to it. So it's very seamlessly scalable at that standpoint. I'll switch on Cache a little bit, John. And the next question, of course, goes a little bit deeper into that. Do you exploit any cache-sensitive data structures such as cache-sensitive index structures and processor cache, such as L1, L2 cache? So we don't do anything specific. So remember, we're sort of abstracted to the GVN layer. So I am not going down into the hardware level and taking advantage of any specific processor caching technology or L1, L2, anything like that. I'm just simply not doing that. I also don't do any of the FTGA stuff. I mean, I've got some folks using that, but I don't do anything specific for that. And really that, for a couple reasons, one, because we don't need to, we've got some fairly optimized things that we've done inside the product. We're very, very deeply down in there and make sure that the things are performed. The short answer is we actually don't do anything with that. And I'm happy to talk about the use cases specific after the event. I'm not too hard to find. You can find me at Precain or LinkedIn, at Twitter if you want, a variety of different things. So short answer. No, don't do anything specific with those, because our caching technology, we've written all of our own technology to operate at the AVM level. I'll get your contact information out too in the follow-up email. It's everyone outside information. And the next question is, how is this different from hash processing in SES and what is the maximum limit of processing? Well, there is no theoretical limit of, there is a theoretical limit to the processing. I have the product optimized in terms of the way that we store data and operate on data effectively. There's no limit. I mean, what's the addressable memory space of a 64-bit processor? You certainly are partitioning data and processing across the nodes in the architecture. And the point to that is that, if you need more capacity, you can simply add capacity in the form of additional nodes across additional hardware, obviously, or virtual environments or public environments. And it's one of the things that's an advantage of in-memory computing technology in general and certainly gritting specifically is that you can simply just fill up additional processing and power. It's not actually just simply, you know, we have a very robust set of parallel execution technologies, so MapReduce, MPPS, MPRPC. I was fully acronym compliant, which I will admit to in terms of that. So I think we'll essentially answer the question. Now, in terms of hash processing, I mean, we are much, much more than sort of just a simple hash table. Yes, under the covers, you know, you can do it with a very simplistic hash, but with very much additional capability on top of that. Thank you. In your reference, does this imply replication? That's a great question, because when you look at technologies, you can in-memory computing technologies and database data storage technologies. So yes, we do replicate. There's a couple of answers to this. There are a couple subtleties, I guess, or a couple different ways I can interpret this. So on a node basis, yes, I am going to replicate data. Now, there are, let's just take the data storage part of this for a second. On one side of things, if you will, at the extreme, it's a fully replicated data storage. What that means is all data resides on all nodes and, you know, similar to the caching use case, right? I mean, you have all data in all places, it's very good for read mostly. That's limited by the fact that your largest memory space that you can have in a fully replicated environment is termed by the smallest memory space available on the nodes that you have. The nodes that make up your whole topology. So part of this is where we see most of our customers going. In that case, you pick up sort of the choice of how redundant you want things to be. And so by way of example, you can set the number of backups for an individual element that you want to have. So sorry, I could probably create confusion. You can set the number of backups that you would like to have across the nodes. So typically we see two backups as sufficient in order to guarantee the level of availability that you would want. I'll give you the real example. I've got a customer that runs us across two data centers. It's a couple thousand nodes, a template between New York and the UK. Now this is actually one giant grid that runs across two, but it has geolocal preference. So the clients in New York have a bias to operate against data that is in New York, and the clients conversely, same thing, with the UK data. In that case, we see two backups, one geolocal, so it's about on two different physical pieces of server, one data center in New York, and then one remote. In this way, you can actually have a failure of filling a node in New York, still processing New York, have the entire New York data center go down, still be able to process and have the UK data center pick up the workload. So one way that we've replicated it, we are very good about how we do this. One of the things that you should always look at when you're looking at in-memory computing technologies is to ask your vendors the question, what happens when I add additional nodes? The answer for grid gain is that we actually add nodes dynamically and rebalance dynamically such that there's no instruction in processing of data. You should make sure that you actually check that around. Then you want a motion of data center replication. So if you're running active passes across two data centers for DR, there is also the ability to replicate either transactionally or non-transactionally to that data center, to that backup data center. So that's the way that we certainly handle that. I love all of these questions coming in. The next question, John, is does this technology support data access via standard SQL APIs or does it require using proprietary grid gain APIs? So that is both. So if you want to query me and to be clear, it's a read-only SQL access, but that's typically sufficient for what folks want to do with SQL in terms of analytics and querying, obviously. You can just do straight JDBC connection, straight standard, select start from employees, where company equals grid gain or by higher gate. Standard SQL will run across that. We handle everything that needs to be handled underneath the covers to get to your data and give you a row set result set back on that. So that's only it, but you obviously also can use proprietary grid gain APIs to go and get that data. At the simplest fashion, it puts in gets, fully transactional support, asset-compliant transactions across that. Very robust and rich set of APIs that you have available to you. So the answer is both, whichever way that you would like to see us, access us. You can. I mentioned earlier, I'm actually going to be adding a Johnson AI, actually the MongoDB API as sort of with the SACRA standard for internet-based access. That's going to be available at the end of Q1, 2014. We really like to make it easy to get data in and out and query data and work with data in the ways that you would be clear with as standard as it works. Next question, does adding additional memory storage and processing impact the IO or does it impact CPU processing? That's a very interesting question. I'm not sure. I fully appreciate where the attendee is going with that question, but obviously there is some, at some point, there is some IO sensitivity and typically we help our customers sort of balance the hardware profile in terms of what they want to do in terms of memory on the nodes. In the CPU impact, for instance, it really has more to do with the types of tasks and processing that you're doing than retrieval of data per se. Obviously things are very quick when you're operating and processing memory. It's kind of a balance between those two, but I have scale-up customers sort of packed on the CPU power and a whole lot of memory in a single box. Sometimes that's appropriate. Sometimes it's more appropriate to have commodity boxes. Sometimes, frankly, it's just simply a financial consideration for my customers. There's no hard and fast rule with respect to how all of these things inter-relate. It's very much more use case specific. With respect to all of those, we've done a lot of optimizations. Everything from our marshalling and serialization routines, our inter-process communication, it's very highly efficient. We are very sort of not sort of not chatty as it were in terms of our protocols and inter-process communication. So very highly optimized. And the question came in. Reviewing Great Games' website, I don't see Oracle as a supported product. Is there a reason that Oracle is not supported? Well, it is absolutely supported. I am going to go find Webmaster and have a discussion. We may not specifically call it out one way or the other, but Oracle would be essentially an underlying data store for us, and we support almost anything under the sun. If you name the database technology within limits, I think that I've got a customer that's actually running it. So I have many, many, many customers running Oracle underneath this big, an Oracle RAC or something like that. No SQL, sorry, no SQL, you know, Postgres, you know, SQL Server, SciBase, all of the usual suspects in terms of database technology, Longo, that matter, React, you know, a bunch of different stuff that runs underneath it. Straight HDFS. I've got some folks that run us on the storage attached, network attached storage type things and files, so a lot of different things. So George, don't worry, we're full Oracle support, first class citizen for working with Great Games, certainly. Okay, and we are getting close to the top of the hour here. I just have one last question for you. Is there, from my own perspective, there's obviously a lot of, we have a lot of architects on the line. Any advice to you that you have for in terms of architecture that you want to throw out for the best practices in terms of incorporating Great Games? Yeah, so what I will say is this, you very much, when you're evaluating these technologies, perhaps in memory database is sufficient for your needs today. Make sure that you look forward in terms of where you need to be only 12 to 18 months, you're going to notice a huge change against this. Now, something that you should look for in terms of these types of technologies, ask about scalability. Ask your vendors to prove their scalability if they're claiming elasticity or scale. I'm happy to do that for anybody that asks me in terms of my customers. Make sure that you understand your fault-polarance strategies yourself. Make sure that you understand and appreciate the trade-offs between the various ways that you can lead in right through and the ways that you can create backups across the entire infrastructure on all of that. Make sure one of the big ones that I'll come back to I just pointed out is make sure you understand how you can add nodes. So, it could actually last 50 to 50, but what happens when you dynamically add nodes? Can you do it? Can you see that? What happens? I'll give you a great example on this. One of the things in terms of given resiliency when you're having issues in node dying. I had one financial services customer that put me through a growing test where they would start 100 nodes every 10 seconds. They would then have a script that was doing two lines on the processes across the grid and then just for fun, they had an engineer standing behind their network rack pulling out Ethernet cables. For us, we come through that with 100% node data loss, but make sure that you put your technology there. Look at your vendors. Look at the stacks and trade-offs for the maturity. We're lucky we came from open source routes. We had five major releases even though we looked like a couple. So, I would say that's the top of my mind. Those are the kinds of key things that we always help our customers with. I think your architect should be thinking about when they're actually adopting these in-memory computing. All right, John, thank you so much for this great presentation and explanation. Thank you, everybody, for the active questions. I love it, as you know, although when you guys get involved and have seen a lot of questions for the speakers. John, thank you so much. I hope, and everybody, thank you so much for attending and thanks to Great Gain for sponsoring this webinar. Of course. And I hope you know if you have any other questions. Again, I'll get the email out within two business days containing links to the slides, the recording of this presentation, as well as John's contact information so that you can, if you have any further questions, you can be sure and get him that appropriate information. So, thanks everybody. And John, if you mind sending me a copy of the presentation too, I'll get that out so I can get that out to everyone. That would be great. Thank you so much. Thank you for joining and sticking with us here through the presentation. I will have the presentation to Shannon in about 20 minutes and she will have it out to you shortly. I'll look forward to hearing from some of you. And if you have any questions after this, please don't hesitate to shoot me an email. I'm always happy to help. Thanks. Thanks so much. Thanks everybody. Have a great day.