 We'll do a quick intro and we'll get the discussion started here. Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor of DataVercity. We would like to thank you for joining our second installment of the New Monthly DataVercity Webinar Series, NoSQL Now at Dan McCrary. Today Dan will be discussing optimizing databases for solid state drives with three guest panelists today. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag NoSQL Now. As always, we will send a follow-up email within two business days containing links to the recording of this session and any additional information requested throughout the webinar. Today we have three esteemed guests joining us joining the panel for the discussion. Dave Rosenthal, Co-Founder of Foundation DB. Brian Bulkowski, Founder and CTO of Aerospike Incorporated. And Tim Callihan, Vice President of Engineering at Tokutek. I think their title won't speak to their impressive resumes. And we're moderating the panel as DataVercity's partner and friend, Dan McCrary. Dan is principal of Kelly McCrary & Associates. He is an enterprise architect and author specializing in emerging database technologies. And he and his wife Anne recently published the book Making Sense of NoSQL, a guide for managers and the rest of us. We can find in the DataVercity book store under featured books. And with that, we'll turn over the floor to Dan to get the discussion started. Hello and welcome. Sorry about that. I forgot to unmute myself. There we go. All right. Hello. I just wanted to reach out and thank our panelists at the NoSQL Now conference. We had three very good presentations done by people that have been pretty much writing their databases from scratch using solid-state data drives. And so I wanted to invite some of the leaders in this area and really talk about why these databases can't just run, can't be modified very quickly to run on solid-state drives. I think there's a lot of very new and very deep technology that these guys are working with. So I'm really honored to have all three of you guys here. We'll just start out going left to right. And we'll have each of you guys just introduce yourselves and maybe just tell us a little bit about how you got into the solid-state drive area. And then we'll go on from there. So, Dave, would you like to get us started off here? Yeah. And thanks for having me. So, yeah, my name is Dave Rosenthal. I'm one of the co-founders and sort of architects and technologists behind FoundationDB. And we've been working on, you know, building a database, as have these other guys for a little over four years now in my case. We sort of designed from day one for SSDs. And when we got started, these Intel X25 drives were sort of just out on the market. And for the first time, you know, SSDs were looking sort of viable. And they had these, you know, some weird characteristics in terms of, you know, write performance versus read performance. And I'm sure we'll get into all of that. But one of the things I think we realized is that you sort of need to design a database a little bit differently to use SSDs than you would a normal rotational disk. So looking forward to the conversation, I personally have done a bunch of the sort of archives testing on various SSDs, you know, you know, bottom of everything on the market. I have a box full of literally hundreds of dead SSDs that we've beaten to the ground. So it's been on. Thanks. All right. Ryan, tell us about yourself and where you're from and what got you into solid-state drives. Thanks, Dan. It's a good opportunity to be able to speak here. And I appreciate it. So I was working at an advertising company back in the 2007-2008. And I really saw an opportunity to help companies cope with scale. I saw people trying to achieve things that were like the front page of Yahoo at the time but trying to do it as a startup with only a few people and really running into a lot of problems. And you could do all that stuff within memory databases at the time, true RAM databases. And that's what really a lot of people do. But solid-state was going to be the coming technology, which was really going to radically change technology. So I'm in my late 40s and I've been, you know, blocked in terms of high performance from doing by storage for a long, long time. We've all been living with seeks being very slow in databases for pretty much most of our lives. So flash is a super exciting technology. And when we looked at the price performance of flash, especially as Dave mentioned with the Intel X25s, there was a lot of possibilities there. I started also then talking to a guy who knew a couple companies back named David Flynn who ended up being one of the founders of Fusion I.O. So he told me a lot about the underlying chip architecture and what firmware had to do for every drive. So we really started optimizing the data structures in our ground-up rewrite at that point. Fantastic. And so for people that don't know about Fusion I.O., there's a company that really does focus on the actual hardware themselves. Is that a good summary? Absolutely. They are one of the preeminent PCIe card flash vendors that really took a look at how to do direct-attach storage in that card. Now, of course, there are many vendors from Violin now has a similar kind of product out. Micron has some great products. We now have the Huawei card on the bench. We're actually very excited about some of that performance. And like Dave, we do a lot of testing. We have a tool called ACT, the Aerospike Certification Tool. We run a lot of drives through that and a lot of configurations, whether it be wide SATA or PCIe for performance. And really try to show where different vendors are in terms of price performance, especially with low latency. Great. Okay. So, Tim, tell us a little bit about yourself and how you got into Celtic Drive. Sure. You know, I did me on Twitter. I called off a database junkie. After graduating college, I was doing debase and clipper early on. I covered Oracle and wrote my first sequel and pledged to not do any more clipper development. I had clipped for a very long time and then went over to VoltPB for a couple years on product management and engineering and really fell in love with the high performance and characteristics of a city and memory database. And then discovered TokuDB a few years later, TokuTech. So what we have here is a technology called Fractal Tree Indexes. And the story from an SSD perspective, I think it's unique on the panel in that DB and the Fractal Tree Indexes themselves were built in a time when writing disks, real rules in SSD was much further down the horizon compared to where it is today. So the Fractal Tree Index, compared to a B-Tree, is all about eliminating unnecessary I-O. Given that the experience of an I-O operation, it was critical to try to avoid that as much as possible. So Fractal Tree were born really to solve the problem with spinning disks. An interesting side effect that we found is the aspects that we baked into the Fractal Tree, number one being compression. SSD, I'm sure we'll talk about this later. SSDs are really expensive. So compressing the data that goes on them, I think, goes a long way toward making them affordable for your standard end user. Another interesting aspect of an SSD is, you can almost say that I-Ops are free. Devices like Fusion I-O can do 100,000 plus I-O operations per second, but they're expensive that come with that I-Op. So you can get the data very quickly, but then you've got to decompress it, deserialize it, put it into the cache, direct other items. So the freedom of an I-Op is kind of deceiving, and when I say free, it's in quotes because I mentioned earlier about compression. They're really expensive devices themselves, or they can be very expensive. And the last piece about our technology is someone earlier mentioned having stacks of drives that they've just run into the ground. Fractal Tree indexes do things that are really interesting. We do infrequent writing, and we write big things, and I think those make our technology very handy-friendly and can eliminate the concerns about wearing out devices before time. Thank you very much, Tim. That's great. So I'm going to go on to a couple of questions, and I guess the format initially is we're going to just do a one-ramored format. We'll pick a question, and then each of you answer it in turn, and then we'll kind of break it up and get more interactive after these first-round events. So let's just start out with why now. Why is it that SSDs are becoming popular now, and why is it that we're seeing a whole new generation of vendors that are actually writing their software from scratch? Dave, do you want to make a shout-out at that initial question? Yeah, I'll take a stab. I'm getting a bit of an echo there, though, so apologies. I think we're becoming popular. I mean, I think we sort of all know the reasons there's a little bit of a problem. So I won't just say that, you know, the price has been coming down and the performance is going up. That's sort of the obvious stuff. One of the interesting things that's happened just over the past maybe a year, from my perspective in the SSD market, is that there's been a lot of maturing of the controller technology on the SSDs to the point where different SSDs from different vendors are finally starting to work pretty well in just one workload of streaming writes or streaming reads or pure random reads or pure random writes. They're starting to work better in real-world workloads of random reads and random writes that are mixed together. And they're also starting to work better in their full. One of the things that's interesting about SSDs is they, and we'll talk about this, normally there's spare capacity that's over-provisioned inside the SSD. And to sort of work well, they need to have some spare capacity to work with. And one of the challenges with existing SSDs is that if you fill them up, they start to work worse and worse, especially for writes. So to me, interesting things about why SSDs are becoming popular in the last couple of years is there's actually been a bit of a convergence around the capabilities of the translation layer and the actual SSD controllers to make it so that different drives are becoming a little bit more commoditized and a little bit more interchangeable. I think we're still in a place as an industry where we as database vendors need to be experts on behalf of our customers about SSDs. But I think sort of rolling forward a few years, hopefully they'll be a little bit more commoditized, much like, for example, 10,000 RPM rotational disk is today. And I'll leave you with that. I think your summary is that it's not just the underlying technology of the solid state memory itself, but it's actually the controllers that have been highly optimized for different workloads since then. Yeah, all the controllers from the different manufacturers are starting to converge in terms of capabilities. I mean, they're not converged, but they're converging in terms of capabilities. And so there's something like one drive's good for one thing and one drive's really good for something else starting to be a little bit more of an equilibrium which makes them interchangeable in the market. So that's really important. All right. Tell us a little about what's your take on it. Why SSDs now and what are the big trends you're seeing? First of all, I'd say that what Dave said there is entirely accurate. We're seeing a lot of the same things. Let's highlight one point about the controller, which is as controller technology has been maturing, we saw basically two or three companies come out like Sandforce and say we are the controller company trying to cut across a lot of different manufacturers. And that was sort of a one-size-fits-all approach. And honestly, I think it was good for the time, but we're starting to surpass that. So the fact that Micron has its own controller and has its own driver, the fact that Intel has basically had two different products for controllers and then one of those was clearly superior and it moved forward. Huawei, again, as I mentioned, has a really interesting controller that's little known in the U.S., but it's their own silicon. Well, at least their own FPGA. Samsung as well has its own controller. So all of these, that's a process that takes years of spinning up the design phase going through multiple revs of controller tech. We're really starting to see the benefit of it. But frankly, I think there's two other issues that really have to be highlighted, and one is simple acceptance. Your friends say, oh, yeah, I installed SSDs. I installed Flash technology. It worked. It didn't burn out. They do seem to be more reliable than rotational disks for this workload. Then that's a mounting range of evidence. As operational people are hearing from their friends that now Flash is working, where the X25 had some problems. It was pretty easy to burn out. It had problems with disconnects and stuff like that. That word of mouth is priceless for the industry. And then the other point is simple price. The range, the difference between what RAM cost and Flash cost, we now have devices like the Crucial M500, which is Micron's consumer brand, currently being priced at half a dollar a gig. We have rumors that Facebook's Exabyte that they are buying is probably around a buck a gig. Apple's big build out rumors I have is also around 50 cents a gig. There's a far cry from the $15 a gig that the early Fusion IO cards cost. We're now seeing the high end of the market in high-sell C drives at about $8 a gig. So price has really plummeted and densities have increased as well as that word of mouth. Ryan, tell us what SLC drives are. Oh, I'm sorry. There are two sort of core technologies. There's a lot of the different chip vendors have some other terms. But the two acronyms to know in Flash chip technology are SLC, which is single-cell, and MLC, which is multi-cell. So MLC drives tend to be quite slower, but also quite a bit denser and cheaper. SLC drives are especially good at high-right endurance as well as high-right throughput. So you tend to see them at different ECB levels. As far as the industry, the industry has really shifted in the last year and a half to be almost entirely MLC-based. Now there's a... I'm sort of drawing some... Imagine me waving my hands a little because there's TLC, which is the Tri-State stuff that Samsung's got out. And Intel has this stuff that they call E-M-L-C, which is also a little different. So there's been some process improvements at the chip guys that they like to highlight, but there's sort of two big families, MLC and SLC. And the SLC drives are currently running 50 cents to three bucks a gig chip price, and that's really rippled throughout the industry. So there's underlying technologies, and each of these underlying technologies have different characteristics for read and write performance. Interesting. Now we're also seeing, just because we're on this topic, the Flash Memory conference down at San Jose a couple months ago, we saw both Samsung and Intel present some very interesting roadmaps that show real legs on SLC... Sorry, flash density. We can expect about 10 years worth of Moore's Law style every 18 months, 2X improvement style. So they've shown some internal, some very believable process improvements that will take us through at least the next factor of 10 to factor of 16 of scaling. That's good news. Tim, tell us your thoughts on why SSD now and what are the trends you're seeing. I've got a thing. I can certainly reiterate price. I think nothing drives adoption more than affordability. If the mass is now, you can get quality if you turn off the shelf devices that are safe to run databases on. If you turn them on later. Maybe some people that are not talking, if you could turn your phones on mute. That's fine. That's fine. Will that be any better? Yeah. That's not all it is. So certainly the braps, some of these, as I'm hearing mentioned, Fugio and others, there's high-end flash devices, PCIe, there's high-end enterprise-grade SSDs, and there's consumer off the shelf of both. You can buy, you know, PCI flash cards that are concentrated in SSDs and kind of roll your own servers with those prices. I think that's changed. There's been a fair amount of uncertainty and doubt about devices wearing out. I think people think about, well, this smart device can only be rewritten each set within the device a thousand times or 2,000 times. And you do have to, and you might really be concerned to think if my database is running at peak per minute, I'm going to wear this device out in months or, you know, be a short number of years, and I'm going to be replacing these devices and I might have some time. I personally am still scouring the Internet to find evidence that in large quantities these devices are wearing out. And I think we've seen enough of that for that to be as valid as in any reported. But the piece that I can certainly speak to in terms of their experience and usage in the market, the raw performance it brings to your workload. So when you make a product that competes with the NODB storage engine, which is one way to store data, is to see if it's equal. And my SQL, if your database largely fits in memory, it's a very performant database. It tends to slow down quite a bit when the database is larger than RAM. And when you find these flash devices and SSDs doing is really easing the bar for performance in my SQL or MongoDB for that matter when you've got really good performance flash on the background. So rather than going to main memory to get data to operate on it, a flash device that can really improve performance. It's not as good as in memory performance, but it's certainly far better than when going to a spinning disk for your workload. So I find that the biggest thing there is people who are not a lot of money can have a new server that's got better storage, be it SSDs or PDI flashcards, and they might get a 5x performance improvement. Just follow orders. No application change, no substantial changes other than better performance. Because other than better hardware. Right, that makes sense. But I think what we're going to do is we're going to stick with you, Tim, and instead of going from left to right, we'll just continue on another question with Tim and then we'll work backwards. I think one of the things I'm struck with now, we really talked about the underlying technology, but let's just talk about the database itself now. Why is it we can't just take our database as it is and create a new drive and mount our data section on that drive and just turn it on? You guys are spending a huge amount of engineering dollars and research. And I think that you really can't do that. You really are rewriting things. Tim, can you talk a little bit about why you really had to redo things from scratch? I think it's unique on this panel in that way. Our technology existed pre-flash, and there were aspects of our implementation that led themselves to work very well on flash. But there is one area where I can certainly speak to, and it's kind of a challenge in that things that are compared to our competition in the MySQL and Mongo space is really compelling. And we've heard the name Fusion.io kicked around or some of the high-end devices. They're really expensive. And you get a 10-to-1 compression on your data with a live-running transactional processing database, and suddenly you're going to make one-tenth the investment in the flash. So the expensive flash, you contend times as much data, or you spend one-tenth the money to get the same work done. But the challenge with compression, and that's what I can speak to directly here, is moving the problem. So when we did really deep compression in Tokutex, TokuDB, if you had to do an I.O. to bring the data off disk, let's say you'd measure that in the milliseconds, five milliseconds or so, to read a chunk of data off the disk, and with the hardest, the greatest impression, something like LZMA, it might take 40 seconds to decompress that data. So in the pipeline of the operation, if I need to get data off disk, it takes five seconds to read the data, and a couple of orders of magnitude less to actually decompress it. Suddenly, that I.O. is nearly free, so let's say the I.O. is measured in microseconds. Now my decompression takes just longer than the actual I.O. operation. I've kind of moved my bottleneck and the expectation of speed. You know, they get much faster I.O. so they want to go even faster, and then it becomes a challenge of balancing how long you do for compression and how much time you spend and affect the running workload. So it makes for an interesting challenge, and sometimes I'm finding that an easier, lightweight compression algorithm might be better on Flash because of the latency. It's when you do an I.O. So that's the challenge we're facing with the I.O. The decompression and the compression then starts to take over and dominate the access times, and we're really careful about that. The I.O. time has gone from milliseconds to microseconds, and therefore it used to dwarf decompression, and now it is dwarfed by decompression. Okay. You mentioned MySQL and Mongo. You guys basically build a layer of software that serves as an intermediary for people right to Mongo API, for example, but they're in fact using your software as the driver to all the I.O. Is that a good summary? An easier way to think of it is you can think of Toko MX as a fork in lack of a better term of MongoDB. So we started with their source code. With their API, we were fully compatible with the functionality. So Toko MX is the MongoDB. What Toko DB is the MySQL? It's another plug-in alternative. But now you have basically your own code base for people that have a Mongo database can just pull over and use the same API effectively. Exactly. Okay, great. All right, let's go ahead and tell us a little bit about why is it that we can't just take a database designed for spinning disk and have it run on a solid-state drive? Well, first of all, that's a great question. And the short answer is you can. You know, I've mentioned Fusion I.O. a couple of times, but their big selling point was they just said, hey, look, use one of our cards underneath Oracle or MySQL, and guess what? You can make all of your hardware spend and all of your application two to three times faster. Wouldn't it be nice just, you know, sprinkle some magic hardware pixie dust over your databases and not have to increase your Oracle license fee or do anything other than simply install some cards? And that's a wonderful story. Is the ballpark there? But can you do better? Well, exactly. That's the rule of thumb when I've talked to people who have optimized, you know, operations guys. The point is that the number of IOPS in these drives, we've talked about them being free, have moved up from, you know, a rule of thumb for a nice swank recent enterprise drive is maybe 200 IOPS, 200 IOPS per second, to maybe 400 for fast drives if you've, you know, for a rotational disk. And flash devices started out at maybe 10,000 and 20,000 in the early days with the X25 and have moved up to hundreds of thousands today, even for modestly priced drives. So, you know, that's a huge difference to go from 200 to 20,000, right? That's a factor of 100, not a factor of two. So when we looked at building Aerospike, we said what's actually going to unlock all 20,000 of those IOPS? How can we get out of the way? How can we invoke the parallelism necessarily for flash devices? And it turns out that due to the internal architecture of the controllers and the chess, you know, with a rotational drive, you've got one head, you seek to one place, and you do that read and then you get out of town. That's one of the, and so staging, doing elevator optimization algorithms were a big part of rotational drive optimizations for the last 20, 30 years. With flash, you really do have random and parallel access. So the first thing we did was we worked out a very RAM-based indexing scheme that was incredibly efficient because a RAM system will be highly parallel, allows you to get through the index, allows you to actually do writes to that index. If you have to shuffle and rebalance whatever you're doing, you don't have to do different kind of locks if you use a proper lock-free algorithm. So that unlocks then the ability to, at the device level, do multiple reads and writes in parallel and really unlock the throughput of a drive and get yourself up to the full-rated chip throughput of the drive. Those and a few other techniques, such as using streaming writes to the device to make sure it doesn't invoke its own internal garbage collection system, those are some of the techniques we've used as well as parallelism. The goal is really to get the drives running at their full-rated speed according to manufacturer and according to synthetic testing. If you can do that and really get to your hundreds of thousands per second, then you're doing something right and you're going beyond the 2x to 3x of putting your relational system and just dropping it in some new hardware. Interesting. Okay, great. David, do you want to take a crack at that question then? We'll unmute here, David. Yeah, thank you. I was. Yes. I was just commenting on the other panelists on their excellent answers. I'll definitely echo some of the same things. It is awfully disappointing to drop an SSD into a database and make it and have it get two or three times faster. You're sort of like, that's not what I expected. I wanted some zeros at the end of that. So, you know, well, your systems have balanced bottlenecks. And so a lot of engineers, database, 10 years ago balanced bottlenecks between compression and decompression and seeks and scans and all of this stuff. But when you go and move something like three orders of magnitude, it obviously goes out of whack. So, it's taken a little bit of time for everybody to figure that out. I'm drilling in on sort of one of the deeper issues from this. Like, if you ask people like four years ago, how would you take advantage of an SSD in a database? The answer, and you know, you find papers on this, you know, SSD optimized, you know, data structures. The answer was to do sort of a log-structured merge style process pattern. Something that does random reads to the flash disk but sort of is aware of the fact that flash can't do small block random writes. It has to actually erase big chunks of flash. So, basically, it has a hard time natively doing random small writes. And so people are saying, well, if you can read randomly from flash and write sequentially to flash and structure your data store in that way, then it will work much better with flash. So, I guess the thing that we had back when we started FoundationDB is that that was not going to be the case forever. That problem of turning random writes into sequential writes using a log-structured techniques was going to ultimately be solved, again, at the controller level, not to harp on the controllers here. And I think that's what we see. So, if you look back to the rotational disk days, they can do random reads and random writes basically at the same speed. And there was some Q-depth effect. Like, for example, the drive might be able to get about three times faster if you get a very, very big list of things to do versus giving it one thing at a time to do. What's been interesting about the evolution of SSDs is as the controllers have improved, internally they're solving some of these log-structured problems. So, today, an SSD does look actually an awful lot like a faster disk. It can do a roughly equal number of random reads and random writes. And the biggest difference is other than just sort of the orders of magnitude of performance is the Q-depth effect. An SSD might be able to run five or even ten times faster if you put a big list of items to read and write. Versus doing one at a time. And from what I've been told about people, you know, to some of these companies that sell these flash memory cards, what we're going to be seeing in the future is even more IOs per second. Again, IOs being free in the flash. But you're going to need to drive more and more concurrency to an actual flash device to get that performance. So, I think in terms of thinking ahead to challenges for us, writing random 4K blocks isn't much of a challenge. But the challenge from us as software vendors is to be able to provide hundreds, potentially, of parallel requests going to one of these flash devices at the same time. Fantastic. So, let me just remind everybody that we're at about 130 here in Midwest and we want everybody to put in questions. They have a panelist here. We have some of the best experts in the field here. So, always good to get some free advice. Then we're going to keep the discussion going for about 20 more minutes, and then we'll open it up for questions. So, I'm going to just kind of open it up now. And if anybody would like to talk about, in any order you guys want, any case studies of industries that are really starting to benefit from this new generation of databases. Anybody just been here, and if you have any case studies that you'd like to mention and some success stories, any industries specifically too? I'm happy to. We've got a lot of these things in production. Do you have any specific industry segments that you guys work on? Oh, yeah. I just wanted to see if the other panelists might want to jump in as well. So, the industry that we've been very successful in is with advertising and marketing technology companies. We're now starting to find some lift within retail sections and companies as well. So, the primary use case is essentially user profile storage and sort of session cookie tracking. These are the cases where you need for, say, advertising optimization or personalization of a web experience you'd like to have maybe 10 terabytes currently online and you want to do that with a very high level of availability and you're doing maybe six billion objects in a table. That seems to be sort of what a lot of our customers are doing somewhere between one and six billion objects per table. And the use is very random across it. So, in cases where it's a very read-centric workload and the working set is small, you'll often find that existing relational technologies are working pretty well. It's really where the working set size is expanding. So, with a flash oriented system like Aerospike, you can say, hey, look, your indexes are in memory. Reasonable cost will actually push, say, 100,000 per machine per node in a clustered system. Right now, the use case of what's called real-time bidding within advertising, advertising optimization, there are nearly hundreds of companies participating in real-time bids for advertising impressions. So, if you really want to visualize it, imagine you go to a web page and there's that little banner at the top or some holes on the side that are going to get filled with ads. Well, that just doesn't go to one computer that has a list of ads and serves you back one. There's a massive infrastructure where let's say that goes to Google as the first stop. Google may fill that ad and then send it out for bid of maybe 50 to 100 companies, all of which you have to respond to that bid and they usually do it within 10 milliseconds. So, that load is currently running at nearly 750,000 database requests per second. Sorry, advertising requests per second. That's 750,000. So, imagine you want to play in that game if you're a company like App Nexus, if you're a large customer, if you're AOL's advertising group, another one of our customers, et cetera, and you actually need to be able to push that on a regular second-by-second basis with 10 milliseconds and you've got to do your own math. So, you've got one database lookup to do and then a lot of math to do to decide among all your campaigns which one is the optimal bid and at what price. So, that's the use case where you've got to look up some user information. You've usually got a huge Hadoop stack on the side which is calculating for any given user, sweeping over all of its behavioral data every six hours to 24 hours, updating segments, updating models, and putting that towards the front edge of your service. And then, at this very high throughput rate, you're making bids on these advertising impressions. You'll also find cases in market research and market retailing where companies like JPMorgan are using companies like S++1 to provide in-depth 360-degree customer view of everyone that comes to their website, people who have accounts to figure out how to place products. S++1 is one of our customers that also uses Aerospike database and not quite in the advertising real-time way but to do calculations and store all of the information and all of the recent behavior. So, essentially, sort of popping up the big picture for Aerospike, it's really all about user interactions. And that's where you've got the level of read capabilities and write capabilities. You have a very large working set because it's everyone on the Internet, and you have these big data clusters, Hadoop clusters and such out there, capable of generating very interesting insights in what to place first. Very good. And the characteristic of that is that they need very, very fast response time for a request. You know, having a bidding and having all those bids collected in under 10 milliseconds seems like something that can be challenging to do if you have rotational bids in there. Yeah, that was really the first use case, and it was also a big driver of flash. So, our first customer who deployed us nearly three years ago, over three years ago, said to me on a call, he was like, look, the economics of my business simply do not support RAM. We've done our modeling, and we know how much information we need to keep per user, and we know how much money we can make on the open market, and RAM's just a non-starter. So, with those guys and most of the guys in the advertising business, they are super aware of what their TCO is, and they know that they can't lose money on every transaction and make it up in volume. They have to be making money on every transaction, and that means hardware cost. But what we're also seeing now is companies like Facebook, right? So, at the Flash Memory Summit a couple of months ago, Facebook made the rather staggering admission in their keynote that they were buying an exabyte of flash storage this year. And it's available on the slides from download from that site, Flash Memory Summit. And the reason they're doing it is not really this millisecond bidding case. It's about how to get, do all of the database requests necessary to present a really rich experience to Facebook customers. So, share and thumbnails and recent activity of other users. I mean, just imagine the massive number of transactions required. So, part of it is about, the use cases we're seeing is about transactions per second and super low latency. But that's also coming into the rich web experiences, as shown by Facebook's buy and some of Apple's recent work within things like the iTunes Store and some of their online properties. It's about rich customer interactions, Dave or Tim, do you guys have any marks that you guys are getting into or case studies you'd like to share? I have more of a horizontal than a vertical from a market perspective. We're working lots of customers and in many successful businesses of all sorts and sizes where they built an application on MySQL or on MongoDB and they're successful and they just want to run faster. And I think by all measures, I have generally the bottleneck on MySQL and MongoDB applications or it can be a big contributing bottleneck. Often times, the attempt to put a bottleneck is to put in flash. But these put in PCI flash cards. The challenge with doing that is we just talked about how inexpensive disks or let's call them cheap is definitely in the middle and probably closer to RAM than spinning disks. So often times we'll run into evaluators and customers who want to put flash under their application to make it more performant. They're doing ODP and they're doing OLAP on the same data. So they've got lots of reads and lots of writes going on. But you guys just can't afford to solve their problem without compression. So my horizontal here that I'm going to explain is it's that type of customer. Someone who flash would solve their performance problem but unfortunately makes their infrastructure too expensive to support the application. And by introducing high compression it's generally equalizes that if 5X compression or 10X compression suddenly one-fifth on the flash you would in its compressed form. Which is generally uncompressed as long as MongoDB is. MongoDB does not do compression and may equal while there's a compression option. You can really don't use it because it severely damages the performance characteristics of MySQL. So that's what we're seeing lots of. It's customers who they've been successful. The ad needs to go faster for them there but flash is too expensive in its uncompressed form. And where the DB and 5X come in to give the people a capability through compression. So a summary if they're IO bound and they don't want to change their software stack basically use your software as an underlying thing and for example if they're running on Mongo they'll just do a forklift upgrade to your software and they don't have to change their API and they'll get much faster throughput. So generally changing the software stack is going to be you know, magnitude more expensive than the in the same application stack. All right. So Dave, let's run it up with you here. Tell me what industries you guys are looking at seeing opportunities. Yeah, I think we've run into a lot of the same industries as the other guys. Consumer, consumer, hands, some games, but I'll try to I'm interested in addressing this pricing question and I'll sort of try to tell you something that I tell a lot of the customers of FoundationDB across verticals and that's about to flash and that's that you really just have to figure out whether it works with the cost structure of your business and a lot of different cost structures to businesses in the world and I would say that a lot of them like the price of the storage medium is really that relevant to the business. If you are running a big business and you have 100 gigabytes of data to store or even 10 terabytes of data to store it doesn't make a difference whether you're I mean to be honest whether you're storing it in a disk or flash or in memory for that matter. So that's one of the things that I guess I see over and over again is people initially come in and sort of say well I don't know if I want to spend the money to store my data in flash because it's like X times is more expensive but I think that a lot of the time a lot of times we sort of convince people that that's not the right way to think about it that you should actually go look at how much data you're talking about storing and then actually just try to put some numbers on it some actual dollars and if it's $20,000 or if it's $100,000 whatever like that you should be asking whether the cost is worth it not whether the factor is worth it and I think what you're seeing from people people even like this book and like Apple are saying you know even though the price is astronomical the benefits are astronomical as well and so we're going to do it and I think that's sort of an important way to look at the cost issue I totally agree I've seen a couple of case studies where people are running a cluster of 50 nodes and just by adding solid state drives they've been able to run the cluster on 25 nodes and that really makes it easier to manage and say heat and housing and all these things so I totally agree it's not just a matter of counting the bytes and figuring out the cost but really figure out what your SLAs are and how many nodes you need to have for that SLA to service it and what people are reminding from what I'm saying is that by using solid state drives they can use a lot fewer nodes and there are no SQL clusters so I can chime in on that for a second that's hitting on this use case I'd like to echo as well we recently flash itself is outperforming itself if that makes sense I'm not just seeing rotational versus flash but flash versus flash when our older customers is now refreshing their infrastructure and they're going from some of the first Samsung drives SS805 where they put four 100 gig drives in a machine and they had cluster sizes about 40 nodes and they're driving the same load now with Intel S3700s at 12 drives and they're taking a cluster size reduction from in the 40s to under 10 so and giving themselves head drive and that's the more modern flash drive absolutely they're going from 400 gig to nearly I think it's nearly two terra per node it's not that much more expensive due to the cluster and they're getting over 100,000 database transactions per second per node so we're a spike so all of that means flash is even beating flash it's great I'd like to echo what he said there and give a special shout out to Intel on the CES 3700 drives because those things are pretty awesome from our perspective here as I said I have virtually every consumer SSD that's been introduced to the market in the last four years and what Intel has done with this drive is they've brought basically consumer consumer SSD like pricing as maybe a 2x premium but they've brought like increased enterprise performance really reliable just all in every way so they deserve, I think a good recognition for that effort here one more thing to talk about on the rice issue if you don't mind let's get to some of the questions because we've got some good questions rolling in for our group here so go ahead all right I think something that's interesting in today's world is is Amon EC2 and maybe other cloud providers because nowadays it's running an application on UPIC technology it doesn't need to be MySQL or Mongo it could be data storage technology they've been running on EC2 for a while and running on CBS or ephemeral disks and they might have guaranteed IOPs we all live in a world where if you're technically heavy you can create servers and you can number to EC2 instances that are flash based and not a lot of money run a test and see just how much better performance you can get and know the price immediately Amazon's going to give you the calculator and it's really different you know I have a picture on my phone of when I benchmarked some Fusion I.O and ran my hand cards four years ago and back then you begged the vendor to get a sample to try something out and you wait and you install it and you configure it in weeks and time and that's what's really changed here is for a couple hours after you see the difference in performance without having to spend much money at all you know it's a bump in your budget for the daily 50 or 60 bucks to work in Amazon cluster with SSEs for a couple hours that's interesting in that you can now you're at the hey regardless of data technology on vendor flash or flash at all virtualized environments yeah so clouds are good ways to test architecture without doing capital buy the drives and if we're out there you might have a good chance of getting it to work effectively within your data center too or you can even use it so let's chat questions here we have in if everybody has their Q&A panel you can see some of the questions are coming up and I'm just going to pick out some that we can answer quickly are people applying the RAID architectures to configurations of solid state drives anybody want to take that one? so this gets into actually a kind of interesting topic which is so what we do at Aerospike and I'm sure the other guys would agree is in order to get a lot of parallelism you want the software to do a lot of the management of IOQs and workloads itself which works kind of in contrast to having a RAID card or operating system trying to manage things like the RAID stripe so what we've found in general is we require our customers to get that RAID card out of the way as much as possible and to turn on RAID and to present J-Bot interfaces with the recent LSI cards it's very important to use the FASTPATH what's called the LSI FASTPATH and we have some notes on our website and I did a high-scalability blog post if you're those are who are a follower of that blog so I would say RAID as a concept you need to spread the code but the standard RAID striping mechanisms and standard RAID cards are really getting in the way from our experience so it's really all about doing that at the database level and having the database really understand a bit more about data layout so RAIDs tend to hide the details of how things are actually happening and databases need that information to really optimize the IO and that's pretty consistent across a lot of the systems and another question how important is the block size when people are doing things do you guys customize block sizes for IO and things like that an operating system with 4K block sizes really doesn't apply when you're working with some of these big data systems Hadoop with their 64 megabyte block sizes I'll try to get that one I think it's somewhat important but I think what you'll find a lot of people are doing are using block sizes of basically 4 to 8K and this basically has to do with what I sometimes call the characteristic size of a storage device and basically the size where the marginal cost of the seek roughly equals the marginal cost of the bytes read so for example in SSD if you read a 1 megabyte block the seek is going to cost you far less than the reading of a 1 megabyte well but on a rotational disk you actually can seek to a spot and read hundreds of K in about roughly the same amount of time so one thing that you see with flash drives is if you can read a gigabyte a second and do 100,000 100,000 reads or writes per second then you can back out what the sort of workable or characteristic size of that device would be and usually for an OLTP type workload you want to have a bridge size which is that size so we sort of come back to where we started a long time ago and that 4 megabytes in a lot of cases and 4 is the sort of native block size of a lot of these SSDs so that's what a lot of these systems use but that's how I think about it I'll chime in on one point when people talk about those kind of block sizes they're often talking about the file system block sizes and so you know most databases older school databases Oracle certainly bypassing the file system is one of your first optimizations so one of the tricks of a flash optimized database is to really do the kind of trade-offs that Dave was just talking about and pick the right block size the right moment but it's going to be using direct access so your operator doesn't necessarily have the ability to change that Tom has a question here do you guys know of any studies about different devices and failure rates and performance and predictive wear analysis on these devices or are these devices so new that there aren't good third-party objective studies yet? I haven't panicked out I we've personally run to had two 96 SSD banks fail in highly correlated ways we in other words we came in the next the next day after leaving our cluster running some tests over the weekend which was a cluster with 96 SSDs and we had over 20 failures of our lives irrecoverable failures so I won't name the vendor just you know I guess email or something but these stories are not unique I've talked to many people that have these stories which I think is why as an industry all of us are so glad that we have somebody with you know the resources and credentials of Intel and some of these other vendors and a little bit more maturity in this space because you know averages don't make sense in these contexts old spin drives have these nice bathtub curves of you know they starting in you know Google has some awesome data about this but I would say if you're running the you know of the same SSD in a big cluster you should try to make sure it's from a vendor that you we're coming up to the top of our hour here is there any closing thoughts that you guys have future directions on when we'll stop using rotational drives for most database transactions next test here my predictions three years okay three years Brian predictions are always interesting I think three to five years it'll be interesting to see what happens in the flash and SSD market to see there's already some stratification between consumer enterprise class and then I enterprise class and maybe there'll be more information between the various lines just like there are in spinning disks you can get a terabyte for 60 bucks or you can get a terabyte for you know $10 enterprise grade on spinning disks and I think that's going to continue in the flash world can you really get what you pay for in many cases and optimize for different types of workloads then all right oh years no predictions you know at FoundationDB we have already pluggable storage engines that work with many and SSD and I think if anything in the coming years we'll be adding support for rotational disks believe it or not okay but that's just because there's always going to be tears of data that are hot or cold and you just can't get around that but I think what I'm seeing is that it represents a sweet spot for a lot of applications that's not only growing and so I think I think so it's both getting into memory and eating into rotational disk I believe and so I think that that sweet spot is going to continue to grow and that's my only prediction fantastic okay one point about this after my short prediction what I love about SSD and flash is if I was application developers you know they give me for saying to be a little sloppier they can write code that maybe does a little more seeks and those a little less hand optimized and build some great UIs and great proof of concept and just know that you know sort of that the hardware guys are going to help them out I'm seeing more interesting apps more interesting optimizations just because you know the power of SSD can be behind it okay great contributions and we'll look forward to other NoSQL now within our series next month and I want to especially reach out to thank Shannon Kemp and Tony for hosting us and we'll look forward to seeing you guys in the future thanks everyone bye bye thank you Stan thank you thank you brought him and Dave and thanks to all of our attendees who are always so interactive which I just love again I'll reiterate damn sentiments a great day bye now thanks good bye later