 All right, everybody, let's get started. This is Dave Vellante. We're here live from Wikibon headquarters. And I'm here with my good friend and colleague John MacArthur of Walden Technology Partners, who's a great Wikibon contributor. Hello, John. Hi, David. Thanks for coming in today. So we are here. We're broadcasting live. If you want to watch live, we're on Justin.tv slash Wikibon. That's Justin.tv slash Wikibon. And I would suggest that if you're going to watch live that you turn down your phone and or your PC, but don't have them both running because there's a slight delay in the audio stream. And so today's Peer Insight, the topic is cloud archiving forever without losing a bit. And we've got several great guests on today. Thank you very much for joining us, folks. Let me first introduce Justin Stadlemire, who's the director of storage architecture at Shutterfly. So if you have to ask, if you're not speaking, if you could put your line on mute, star six will do that. Or I can do it for you. Just bear with me for a second, folks. And I will. OK, so Justin is the director of storage architecture at Shutterfly. We also have joining us, Sebastian Zingaro, who is the co-chair of the SNIA SIG on cloud archiving and R&D architect at HP. And also a co-chair of that same SNIA SIG is Chad Tibido, who is with Cleversafe. He's the director of product management at Cleversafe and also handles alliances over there. So can you guys all hear me? Welcome, gentlemen. Thank you. David Floyer. Thank you. David Floyer is also on the call. I think, David, you were with us earlier. So good morning. So let me set up the call very briefly and then we'll jump right in. So organizations are increasingly finding that traditional array-based storage is not meeting the cost and data integrity requirements for cloud archiving. There are less expensive approaches emerging. And computer scientists are suggesting a fundamental change to the way in which we protect data, disperse data. And erasure coding is increasingly gaining a foothold in the marketplace rather than brute force replication. And this peer insight is going to look at those trends and look at the use case in particular of cloud archiving and we've got a number of experts joining us today. So I'd like to start by setting up the call with an overview of what the SNIA SIG is doing. So Sebastian and Chad, I wonder if you could talk a little bit about what are the key requirements that the SIG is looking at and what are you guys trying to accomplish? Let's start there and then we'll talk a little bit more about the taxonomy. So Sebastian, maybe you could lead us off, please. Sure, thank you. So I'll give you a brief overview of what we do in SNIA. So basically for all of you that may know, SNIA, SNIA is the Storage Network and Industry Association. So it's basically sort of people from companies, storage companies that the mission of SNIA is to basically promote standards, technologies, educational services for power, everything related to storage, management of information. So it has different groups inside SNIA. One of those groups, it's the cloud-based initiative that's working on the cloud storage area. So for example, the one of the standards that SNIA is promoting, which is CDMI, it's called... Sebastian? Yes, Sebastian. This is Chad. Your connection, I don't know if you guys, your connection's a little bit tough and you're breaking up a little bit and I don't know if there's a hearing that. That's true. Chad, do you want to expand on this? Yeah, let me... Sure, so Sebastian, let me go through some of what you guys talked about, Sebastian. So Sebastian was starting to outline the cloud archive and preservation mission is to advance the use of public and private as well as hybrid clouds for archive services and then long-term data retention. And the key is that we are underneath the cloud storage initiative or CSIGIA, which is responsible for developing the CDMI specification, which is a data management interface standard. It's really the only industry standard right now that's been proposed to tackle the challenges of getting data from one public cloud storage provider to another. So today there exists proprietary standards but not an industry standard and that is really the goal of the CSI group as well as the goal of the CDMI spec. So with our group, we're obviously much more focused on archive and long-term preservation and then also the challenges that go along with it. So some of the things that we're trying to tap in our group are things like migration challenges from one public provider to another. Data format challenges relating to long-term data preservation as files and formats evolve over time, ensuring that customers or businesses have the ability to access that data with those different formats. The other, some other things we're also looking at doing is trying to understand other requirements that the long-term preservation industry has, people who refer to them as curators. And again, it's much more than just accessing objects. It also then gets into the realm of doing e-discovery and management and basically just trying to do a better job of organizing all of that data again from a more long-term perspective than short-term. And I think that's again really what Justin gonna jump in and talk about as he goes through some of the challenges that he is gonna shudderfly. So if just a- So that's kind of the premise for the Cloud Archive and Preservation. A couple other things just to set up in context as David alluded to in the beginning is we are proposing a couple of terms as well. And again, this is from the point of SNEA. So there exists, you know, similar definitions in other organizations, but we're trying to put context on from the SNEA perspective. So for example, this taxonomy, the Cloud Digital Archive Service, which is basically a cloud-based service specialized in an online storage repository for the purpose of compliance, litigation, support, or retention for extended periods of time. So with that idea, again, we're also then alluding back to leveraging the other organizations within SNEA, and specifically for example, this CDMI specification to achieve, you know, that archive service. So that's an example of a taxonomy term that we're submitting. Again, there's other challenges without into a lot of depth on them. We will really more time hearing from Shutterfly and from Justin on what he's doing. But let me just put a couple of plugs out there for our group. So we're gonna be at Cloudburst. So for those that are listening in, Cloudburst is a comp that happens at the end of September. It actually follows storage developments. So if you just do a Google search for Cloudburst, you'll see information on that. And we'll also be presenting at SNW, which is in fall, and that's around October timeframe. And again, if you go to the website or you do Google searches for Cloudburst or SNW, fall 2011, you'll get a lot more information. We will be presenting at those and that's where people can learn a lot more about the Cloudarchive and Preservation Group. And then last, if you go to www.needs.org, lastcloudarchive, there is a webpage dedicated to our group. And again, at that template place, you don't have to be a member of SNEA actually to access that and there's information there. So that's kind of a quick summary of our group. I guess David and David, if you wanna open it up for any questions for our group before we kind of go into the Shutterfly. Yeah, Chad, I had a question, just a clarifying question. So it sounds like the SNEA SIG is really focused on a lot of the same requirements that we know and love around archiving, right, John, immutability, provenance. About things like changing the technology over time. In other words, will the hardware that I use today be able to support this down the road? Are those issues part of the discussion as well, Chad? Correct, yes, exactly. So you bring up a good point, David. Yeah, there's really a lot of different areas that we can go into trying to do for right now. So we're a brand new group as of January. We're really trying to do it. It's on a couple to start with that we can make an impactful difference and we can try to help educate the industry on and to the right. Our expectation is it's gonna branch into many different areas, but yes, that's absolutely correct. Okay, so let me stop there and open it up. Does anybody have any questions specifically as it relates to what SNEA is trying to do, the standards? I mean, a lot of people are skeptical. I mean, I'll throw another one out there just to get the discussion going. A lot of people are skeptical of standards bodies and SNEA has often been criticized despite the good work that they're doing. My understanding is that the CDMI standard is advancing at a much more rapid pace than what we're used to. Is that, can you confirm that and why is that? Yeah, actually, that's the perfect segue. So I forgot to mention that right now, SNEA is in the process of submitting the second revision. It's really, it's version 1.0.1 of the specification, but they're submitting that to ISO and they're also working with. So I agree that in the past, there's been challenges that SNEA's come up against, trying to define certain standards. I think Seth, they've had with the CDMI spec, is first and foremost, because they've learned a lot from their past experience and so they're definitely trying to not duplicate those mistakes or replicate those mistakes. And so they're making sure to engage with these different other standards bodies like IO, NIST, IEEE, so I do believe that C will be successful. And again, right now, there is nothing else that's really being proposed that is going up against it. So I think it does have some good momentum. Yeah, this is John. You mentioned a number of areas, data migration, data preservation, file formats. When you think about the specification, it'd be good to have some discussion around what areas are some of the more difficult areas to actually reach consensus on a specification. Yeah, so I guess to make it a brief answer, because that can go really many different ways in the week after whole, but I think the, one of the challenges with the current by spec is the specification is that it's really meant to address public providers. And consequently, there's many different ways that people can support public clouds in terms of with different metadata options, interface options. So the C might expect is trying to be brought in that respect. However, I think one challenge that makes up happening is when people actually implement it, they maybe take exception to much of that specification and still can remain complex. I think that's one of the challenges when you're trying to make a broad spec is, by doing that, give people maybe too much flexibility. So they have to keep an eye on that. But outside of that, again, it's actually going very, very well. I've participated in some of the technical work groups. And again, you've got some big industry companies that are participating as part of development, the spec development. And again, it's going surprisingly well. Okay, let's talk a little bit about Shutterfly and the case study there. Justin, if we could turn to you. Shutterfly is just an awesome service. It's a personal publishing service. You go to the Shutterfly website and right there, they tell you, we store your pictures for free. It's unlimited storage. We secure them at a full resolution. So right away, John, we're seeing some really challenging storage problems. So Justin, I wonder if you could talk a little bit about, start with some background on Shutterfly, what your role is there, and then let's dig into the particular problem that you had to solve here. My role at Shutterfly, obviously, is director of storage architecture. Background here is I started about a year and a half ago with the challenge of taking the existing data store, which was at 19 petabytes of raw disk at the time and coming up with a new and cheaper solution to be able to maintain the archive long-term. Taking a look when I first started, Shutterfly was doing anywhere between seven and 30 terabytes of new data per day. That obviously has to be maintained long-term. I've taken a look at a lot of different components of data, obviously as cell phones and other cameras components become much more ubiquitous to every single person in the world. Our user base continues to grow, and as though the amount of photos rolling in, day in, day out. Sorry, was there a question? Nope, please continue. Okay, so obviously with the promise of free, unlimited photo storage archive forever, it means that you're gonna be buying an awful lot of storage and maintaining that for a lot of users. That also means that costs continue to go up, and meantime to between failure components are gonna continue to increase in failure rates as you end up with more and more pieces. So with 19 petabytes online, I've been here about four or five months before I, and investigating a slew of technologies, and I'd kind of determined what it was I wanted to go for moving forward, and we actually ended up, which was in short an erasure code, an erasure coded object store with rain embedded, right? So effectively being able to do an N plus M style erasure coding mechanism as opposed to something like rate five, which is N plus one, or rate six, which is N plus two, with N plus M obviously I could go 10 plus six, or eight plus five, or whatever the case may be, not to get into specifics, but, and now obviously using rain, it'll allow us to do something of a commoditized style nature on the back side where we can throw relatively generic hardware at this storage platform. So, Justin, Justin, I'm sorry to interrupt you, I just wanna let people know some of you may not be familiar with terminology or maybe want some other background, on Wikibon, wikibon.org, right in the center of the page under Professional Alerts, there are two new articles that went up just recently. The second from the top, right under Professional Alerts, is Perpetual Archiving for the cloud, it's got some diagrams in there, and we may be discussing those in more depth, and then there's another one on erasure coding and cloud storage eternity, which David Floyer wrote, it just describes erasure codes and how they apply. So, go ahead and check those out. Sorry to interrupt, Justin. That's right, I wanna check those out too. So, it ended up that we actually had a bit of an incident last year, where one of our traditional storage arrays had a controller issue, and this is a two petabyte storage array, and it ended up losing 172 drives in one shot, which cost us parity across the entire system. Now, we had no data loss, and we actually, obviously we'd prepare for the event of something that happens with still a two petabyte array going offline or close to it, is not a pleasant experience that anyone wants to go through. So, I ended up taking us three days to calculate parity back to one parity bit, and almost three weeks, totally, to get dual parity back across the entire system. That just helped validate my position that using a almost nothing shared model and using erasure coding on RAIN to move forward, we're really gonna be what it took to get to the level of redundancy and security that we need for our archive. And ends up that today we're at roughly 30 petabytes of raw disk right now. We're seeing about 40% year over year growth. On average, we're seeing 25% year over year increase in image size, and there doesn't appear to be any slowing down of the amount of data coming in the front door. So, Justin, I wonder if I could just, again, level set for the audience that may not be as familiar with these issues, and we've talked about this before on Wikibon peer insights. As disk capacities grow, the time it takes to do rebuilds on a failure it becomes onerous. Right, exactly. So, I mean, I'll give an example here today in a traditional RAID six style array. If you lose a two terabyte drive, it's going to take you anywhere between, let's say, 50 hours on a almost completely idle array that can do nothing but just copy data in and out to recover that drive to maybe as many as two weeks if you're running on a really busy array that has to share IO on the backside to recover that drive and it's very limited resources to do so. So, you've got the issue of limited resources and the biggest issue is that you're exposed during that rebuild time and the probability of data loss dramatically increases. Is that right? Absolutely. So, for each parity bit you increase on the backside. Effectively, you end up with roughly 100x the reliability. Now, obviously the math changes slightly differently as you add parity bits, but it's a pretty good guide or rule of thumb. And so you end up with, there's actually a couple of components there. So, let's say on traditional storage array, for example, if you've got a eight plus two, rate six stripe on the backside, you end up, no matter how full the parity group is, say you've got it 10% full. If you lose a drive, you actually have to reconstruct the entire drive to rebuild the parity group. And some of the newer components will say, are you doing an erasure coded object store? For example, you only actually have to recover the missing data. Now you might have to read that from more components. Like I could say, if you're doing a 12 plus three, you might be reading off of many more components, but that means you also have a lot more resources to be able to do so. And you have a 100x at worst case, you're down to a rate six style reliability model. And so, assuming you had a drive failure, if you use that same sort of analogy of a comparison on a rate six rebuild, what's your time gonna, what's the sort of range of times it's gonna be to rebuild using a erasure code? Now, it depends on the, obviously, it depends on the throughput of the system and a couple of other things, but I've had drives recover in a matter of one to two hours. And my current example, just because only had a small amount of data on it, I had a lot of resources left to be able to go through and rebuild the drive because it was just a traditional file system copying effectively data on it, rather than having to reconstruct the entire array. Now worst case, of course, I'm gonna have to say I'm 100% full on that erasure coded segment or on that drive then it would have to do a full rebuild and it could potentially take a little bit longer as I don't have necessarily the resources of a million dollar array on the back side of my commodity hardware platform. So that said, I've still got three or four or five other parity bits that can carry me through until that's done. So I'd say worst case, I'm actually slightly slower to recover but with a greater, with a much greater amount of redundancy and best case, which has been the case so far, I recover in a matter of hours. So are you in the process of migrating over from RAID 6 to erasure code or what's the status of your environment now? I wanna give away all the secret sauce that effectively we're doing a combination of a lot of different things. So on the same side, we've got a lot of traditional arrays obviously and I don't think that those are gonna go away anytime soon and I actually think that traditional array storage vendors are going to at some point, catch on and probably implement erasure coding in the other controllers to the back side as opposed to traditional RAID mechanisms. But that'll be a big change from how users are used to addressing storage. In the meantime, we're actually, we've got a traditional massive archive problem. Right, I mentioned that when I started a year and a half ago, we were at 19 petabytes today, we're over 30. And so we're actually constantly migrating data all over the place as well as with a, you know, a massive influx of data on a day-to-day basis. So at any given time, we're not only migrating data on the back side to keep our disk IOPS kind of balanced across the entire site. We're also getting a big influx of data, so we're doing both is the short answer. Okay, so drive failures in your environment at that many petabytes is a daily occurrence, multiple times a day probably, is that? Correct, you know, there might be the odd day where we don't see a drive failure, but typically we see at least one to two per day. So your primary driver was a piece of mind during that rebuild process. Well, yeah, I mean, I mentioned before that I had five days with no parity, right? And that any, you know, when you still have a big array like that, you obviously not only have to, you have to effectively stripe the arrays one way and then you have to stripe the volumes built across it the other way. So effectively any single additional drive failure at that time would have cost me effectively the entire array mix. So I would have had 1.8 petabytes of data down that I would have had to pull back up off of tape potentially, or out of, I have a temporary second copy that I might have had it there, but either way it would have been an ugly situation. So yeah, absolutely. You know, now, anytime I, you know, today anytime we had a drive failure, you know, you're kind of keeping an eye out for the second one and if the second one goes you're in a real bad place until you actually get the first parity bit recovered. Now, you know, if I have a drive fail, there's no rush to get it replaced. I've still got, you know, multiple other parity bits that can cover the workload in the event of a problem. Okay. As well as, you know, silent corruption problems that exist within SATA at scale, or any system at scale effectively, you know, it's not just hard failures, it's soft failures that you need to account for. And so, you know, with every, you know, I think drive unrecoverable read errors right now or at one to the 16th, to the 14th, depending on the drive type, which means roughly every, well, let's call it, every 12 to 100 terabytes, you're gonna get an unrecoverable read error off of the drive, right? Soft failure, unrecoverable read error, not hard failure, the drive is down here recovering. Now, that'll still pull from RAID or erasure coding to be able to recover that. But that means that the effective chance of having a dual failure or a triple failure increase exponentially as you get to multiple petabytes like I have. So, you know, I could potentially run into the problem where I don't have a drive, this is the scary part, right? Where I don't have a drive failing and I have two unrecoverable read errors or three unrecoverable read errors within a segment. Now, it's granted it's highly unlikely, but it's altogether possible. Yeah, so this is the cloud perspective because we're talking about petabytes, not terabytes here. And I wonder if we could talk a little bit more about the economics. I mean, from a business case standpoint, clearly there's a risk factor that you can point to. What about other cost factors related to comparing erasure coding approaches with traditional RAID? What did you find there? So there's a lot of ups and downs, right? As, you know, I'm effectively building an internal cloud which makes things much easier when you're dealing with the next internal cloud. You start paying for someone else's disk and someone else to maintain them as well as inbound and outbound network on their side as well as your side. So external cloud really didn't look to make much sense for me, especially at my scale where, now that I haven't evaluated it, so I'm looking at it, you know, as a small scale storage startup, as a small scale startup, the advantage to going with an external cloud are huge because you've got no investment time, no investment costs, you just outsource it and, you know, rapid iteration on your development to get going. So for small companies, I would say it's actually highly beneficial and almost certainly the way to go from a ROI perspective for larger companies, I think, that we're continuing to look at internal clouds. Now I've done a lot of, I was specifically brought in here to greatly reduce costs. I was hit with a huge cost challenge of getting down to, I think, 33% of what the existing storage world was charging. And you can imagine when you buy it, two petabytes or three petabytes at a time, you get a decent bulk discount Costco style. So we had to take a relatively novel approach to doing so and, again, I'll leave some of the details out, but effectively, you know, people start to talk with a ratio coding, you know, while you're getting with plus three or plus four or plus five on these parity bids, so, you know, why not just do multiple copies if you're gonna have that much parity anyway? You may as well have two or three copies of the data. Well, the truth of the matter is with the dynamic ratio coding model of, let's say, K plus N, you could be doing, if you're traditionally an A plus two RAID model, you could be doing a 16 plus four or ratio coding model and end up with the exact same amount of overhead, but with 10,000 X worth of reliability from a storage perspective, right? And on top of that, you're effectively going to be doing that on commodity-style hardware, rather than, you know, in a very expensive, traditional, very traditional storage mechanism from a big iron that has a very expensive hardware cost. So you made an interesting point in your comments that if you're a small company that hasn't invested a ton in the existing infrastructure, you might want to think about starting your archive in the cloud as opposed to creating your own internally. You definitely have to have a model though, I would say, for when, you know, that's gonna start costing you more on one side than the other and have a plan for getting in and getting out. So that, good point. So that then very much brings in the question which I heard from the SNEA representatives of the importance of having a data migration mechanism. I assume it at 30, 40 plus terabyte, a petabyte, you're not looking for how you're gonna migrate that data offsite to a public cloud. But if I'm somebody who's started in the public cloud and I'm thinking of either bringing it in-house or migrating between public cloud providers, then I need to think very much about those issues. Do you have any advice? Do you have any advice for people who are sort of at that decision point? Yes, keep multiple copies of your metadata. One of the other things is actually we check some every single, you know, as effectively we're image based here. So we actually check some every single image going in and going out, right? Cause we're constantly doing migrations. So, you know, we actually have a copy verify or, you know, a read, copy, verify as these jobs occur when we're migrating massive amounts of data. So, you know, it's not just enough that we read and move the data. We actually have to verify and validate that it is still what we think it is. You know, any of these unrecoverable errors could occur at any given time effectively and it could be a silent corruption problem. So anytime we do a write of data to anywhere, we actually read it, we get effectively do a read, verify and do a SHA-1 check to guarantee our data consistency and end out. And when you're doing that with a public cloud, it's even more important to do so. So I'm imagining with all of the, so I'm imagining with all those checks that the time window for migrating large amounts of data is fairly extended. Absolutely, I mean, you're effectively not only copying it, you're then reading it back. So it definitely creates a bit of overhead. Now that said, you know, there's definitely ways you can optimize it. And effectively, if you read it relatively quickly after you write it, you can still catch it while the file handle is in cache and a couple other little things. But, you know, at the end of the day, you're still gonna be reading, you know, terabytes, hundreds of terabytes or petabytes of data back. You know, we're talking, validate your data. In talking about this idea of a small startup using the public cloud and avoiding capex up front and then maybe over time, trying to take things in-house to reduce its cost, there's a great case study on Wikibon that Dave Cahill wrote about Zynga. If you just search on Wikibon for Zynga, you'll see the case study. It's a really fascinating way of how they're bringing sort of a commodity cloud in-house. And it sounds just like you've done something similar. And of course, your background at Facebook and eBay where costs are pretty fundamental gave you some street cred, I guess, to do this. You said you were tasked with reducing already relatively low storage costs by, you said 30%, is that correct? No, 233% of where you were. Wow, okay, great. Big number. And actually it's funny you say that with Zynga because actually when I was coming to Shutterfly I had had some talks and played some golf with a VP of operations over there about doing that for them as well. I'm intimately familiar with the Zynga model of how they did that. That's exactly what I was talking about. That's a great case, right? That they had massive expansion. They would have had huge issues keeping up with demand internally up front without, you know, they're a big dev shop. They didn't necessarily have the operations for a course or experience at house already to be able to expand out their storage infrastructure or their CPU infrastructure. So they relied on Amazon with EC2 and S3 to be able to do so. But you know, after six to nine months of that, all of a sudden, you know, they're still wild and profitable but they can take somebody consider there and take a look at the number and see how much money is going out the door to Amazon and think, you know what, we could probably do this cheaper in the house. Yeah. And that's exactly my point. And they've built a true hybrid cloud on commodity components with the Amazon web services plus what they call the Z Cloud which is a very fascinating case study. So, okay, I wonder if we could come back to sort of the erasure coding and I think we understand why erasure coding but what did you specifically do? I mean, erasure coding is not mainstream, is it? And how'd you find it? I mean, what did you use? What was the solution that you deployed? Can you talk about that a little bit? And as part of that, can you also talk about the intellectual property components of erasure coding, are there patents around it and who owns those patents and so? Yeah, so let's take those separately. What did you do? What was the solution that you deployed and why? Well, I'll start a little higher level on that but I started with kind of a list of requirements of what I needed to have happen. Right. Or what did I need to make occur right and eventually that's where I got down the path to, okay, I need erasure coding and I need rain. And from there, I just started trying to track down companies, software, open source projects, anything you can think of to get down that road. And so I ended up, I'm actually working with Chad who's on the call and clever safe for our deployment. And it's fairly successful to date. We've got a lot of back and forth. I've got a lot of ideas around how we should actually deploy things like this at scale. But you know, that said, I looked at open source projects from UC Santa Cruz like SAP, which is not ready for prime time. I looked at Tahoe, LAFF, there's other companies out there like there's, they have a competitor amp with it. But you know, obviously I was just shooting for best of breed and to make sure I could get done what I needed to get done and meet my challenges met. And so at the time, they seem to have a much greater foothold in the market, even though there's still a relatively small startup. They kind of had a lot, you know, there's multiple years of development time behind this. So when you get down the path of, okay, there's these two or three open source projects, there's one, maybe two valid companies that can already do this. You know, you get to this built or by decision, whether or not you can be successful in any of these given, at any given time, effectively. And so, you know, working with clever safe, I believe they use a Reed Solomon file or ratio code. And so, you know, they've done a lot of optimization, a lot of changes to make that dynamic, I believe, to be able to scale that up or down as they see fit. Which, you know what I mean, effectively, rate five and rate six are also a ratio code. They're just fixed width. So we ended up doing a lot of testing, a lot of deployment internally. And at the same time, at any given time today, for example, we take in a write from a customer, goes to a single array, gets checks on them back and forth through the application and they're validated. That also gets written to a second array at the same time and validated. To be able to ensure that, you know, I could implement this new technology at the same time that uses a different mechanism, no longer obviously a traditional block based storage, it's a new style of object store. We put in a lot of, or actually, I put in a lot of requirements for how the software has to maintain back and forth availability between all of these components and how they're gonna maintain metadata consistency between all of them as well. Right, so you're talking back and forth to multiple databases. You're maintaining object IDs, which is another problem with an object store. You now have to maintain, your metadata is no longer part of the file system, your metadata is now an object ID. They get handed back to you over the wire when you do an HTTP put or however you push the data into your object store. So, maintaining metadata consistency becomes even more important because you can't just check the data on the file systems. There's no LS, there's no FSDK to recover the file system, right? It's either there or it's not for your metadata. And so, we actually maintain not only our traditional metadata store that was the traditional Oracle Star file system or sorry, Oracle Star database that also maintains the file system data because, you know, anywhere you look when you have an application that talks to a file system, the application isn't actually reading the file system, the application is getting metadata out of a database that then talks to a file system. So, it's not that hugely to be able to explain to developers how to shove an OID instead of or an object ID or globally unique ID into a database that just exists alongside on a new table or a new column to be able to add that data. But we're maintaining that multiple different ways to guarantee consistency, you know? Not only do we have it in one database, we shove it in another database. Essentially, you know, we're doing a lot of work with Mongo at Shutterfly. So, we're actually using a MongoDB database as well. Well, no SQLDB, as well as we shove it into a traditional file store or sorry, we just shove all the metadata components out into a traditional log file, which, you know, worst case means you can just read the log file back and rebuild your metadata if you have to. In addition to that, we're doing some other consistency checks and components as well around metadata that I'll call SecretSofts for now and leave it at that. Cool. John, to your point about intellectual property, ClevverSafe is a relatively new company, startup small company in Illinois, kind of stealthy, even, you know, they've had some funding and that was sort of quiet up front. They do a lot of stuff with military and government applications so a lot of the customers don't talk, right? It's the- Along those lines, they didn't find me, I found them. You found them, yeah, interesting. And I had to look along in order to do so. Yeah. I think I had lunch with a buddy that is another storage guy. And he's like, and I was talking about what I was working on and trying to build and he's like, can you look at these guys? And I think that was how I found them. Yeah, and we kind of- I was talking about over lunch. It was not, you know, the Google search or anything like that. I don't know if I'm up initially. Nick Allen turned us on to ClevverSafe a couple of years ago and he's sort of a trend spotter in new technologies. And so I know I was talking to Chris Gladwin in the spring, John, when ClevverSafe issued a press release around some new patents that had secured five new patents. But the interesting thing to me is Chris told me that they had 65 patents pending. And then he talked about the number of claims that they had associated with those patents. It was literally hundreds. So normally a small startup, maybe he has a couple of patents. This is a very patent heavy, you know, company with a bunch of MIT Alpha Geeks. Even the head of marketing is a MIT Alpha Geek. So it's like you got to have that in your DNA, I guess, but okay. So I want to do a check here. We've been dominating the conversation with a few people and I know there are folks online that may want to ask some questions. I had to mute some lines because we had some background noise. So I'm going to slowly unmute them. But while I'm doing that, I wonder if I could bring in David Floyer. David, you there? Yes, I am. Could you talk a little bit about your perspectives on this? We've been hearing about a lot of the benefits, a few of the drawbacks. Give us, keep us honest. What's your view of this whole discussion? What are some of the gotchas that practitioners should be thinking about? I put a piece up on erasure encoding so people can, examples of erasure. But the key is that the, as you increase the number of fragments, the number of components that you can break it down into, so your availability for the same amount of resources goes up absolutely astronomically. With the case study, if you've just got a single copy and that's at 99% availability with the same equipment, you can raise that to 99.12345678. It's just a huge increase. I just want to add a little anecdote there, which is based on the math of looking at rate five and rate six reliability, I was theoretically, I should have theoretically been losing 136 megabytes of data per year for the size of my archives. Now my data availability number, according to the math, I simply believe that it should be that I can maintain data reliability for around five million years. Right, well, that's the math of it. It's amazing, isn't it, the increase? So as a technique, it's clearly going to be valuable as the size of the disk get bigger. The downside is very clearly is in terms of how do you manage this? It requires a lot more resources, compute resources to actually do this. So you need completely different types of management of the validity and the identity of all these blocks. So that's the downside is that you have a completely new method, new methods with taking a lot more process at time and effort. Obviously for archiving, that's one of the reasons why it's suitable to begin with as the compute time is high. And because the compute time is high, especially if you're spreading that over different nodes, over different locations, the response time obviously is higher that way as well. So the trade-off is response time's a slightly higher, the amount of compute is higher. But the availability and cost is much lower if you think about the availability, cost and performance is the three edges of the triangle and the three trade-offs. So those are the trade-offs that you have here. So David, you've been watching, sorry, go ahead. Just to finish off where it's going in the future, I think Justin said that these techniques can be used within arrays just as much as between nodes. And I think there's going to be fairly rapid adoption of these types of techniques within storage arrays as well as within cloud arrays where you've got multiple nodes over the cloud. By when? Can you give us a timeframe? To move out, then you ever think in terms of seeing new arrays with this sort of capability, I think within, you'll see the first coming out within a couple of years and then it'll be a three to five year knowledge immigration time. But certainly within two years you'll see the first of these coming out. Is this the dominant cloud use case in your opinion, this sort of archiving model? The number of cores increase on the processors and we get the amount of compute power. I think this is going to become actually a dominant for all types of off storage. The math is just too strong. I agree with that completely. I want to give, we've really dominated the conversation. I want to just stop and give everybody a chance to chime in. I've tried to unmute almost all the lines except the ones that are marked rogue lines. So if you have any questions for anybody here in particular Justin Stollemeier of Shutterfly, please chime in, questions, comments, insights. Please share. Well, my question really was around a trend, processor trends, so increases in compute capacity versus storage requirements. And when you sort of an access density and when you sort of stack those up and you look at the increased compute requirements of this type of approach, are you comfortable that the processor trend will stay ahead of the requirements? Absolutely. And I suspect that there will be specific ASICs or whatever it is to actually help achieve these sort of capabilities. Justin, you're actually doing some work in this area, aren't you? A little bit. Actually, one of the things that I've actually, what we're doing, I've got a couple of different components that I've looked at doing as well as I'm looking at using some bit-for-bit compression style stuff, and I was looking at doing some custom style ASICs for some of that workload. But it actually ends up that, you know, between the amount of CPU and FPU, you require to do any of this erasure coding component. I'm actually running out of network bandwidth before I run out of CPU analysis. So it ends up, even today's CPUs are very powerful. Now I am doing a separate, I'm basically splitting out my erasure coding tier from my storage tier, right? So effectively, I'm doing a two-layer architecture where they're both horizontally, separately scalable. So my erasure coding tier is actually, just stacks of Intel boxes, effectively, that if I start to run out of CPU, I just pop another one in the mix. And I'm only running on dual quad cores right now. And so there's a lot of headroom left. So I'll be able to go on that. You know, I'm running on basically highly cost-performance models where if you really wanted to crank it up, you could go with the latest and greatest from Intel. I'm just kind of stuck waiting for those prices to drop to keep things as commoditized as possible. I'm trying to remain on the safe side of that cost-performance curve. And minus one technology, I'm presuming. Kind of end plus M again there, actually. But yeah, effectively. So how about? I ask what the relative cost to disk ratio is. You know, the control is a disk ratio. Is that in line with the raise or? No, it's much, much lower. I'm not even close to this cost. I'm actually, I've actually got that cost included in my entire 33% target. Right. So David, what's a, what's a two- It's much, much lower. David, what's a normal ratio? Was it one to one? Normal ratio is about 10 to one. 10 to one, you're saying disk to controller? The raw cost of the disk to the cost of the array by a fries disk, the fries disk model. Ah, okay, okay. So you're saying the premium you're saying is 10 to one. The premium that you pay... Is 10 to one. So you're paying, you're paying, array vendors 10X what you would raw storage you're saying. Normal. In enterprise storage. I think, I mean, it's just what a cost. I'm just, I'm just trying to get the numbers out. And Justin, you're saying your ratio is substantially lower, a little bit lower? Okay, great. How about disaster recovery? Can you talk about your business continuity strategy? 2.2 copies of the data. I could put 1.1 copies in one location, 1.1 copy at another location and have much greater reliability than even three or four copies of the data potentially. And so that'd be one potential model today. We're actually effectively in one site with offsite tape. Okay, interesting. So you've got basically... Effectively, I mean, the part of the problem was that, you know, when you've got 20 petabytes of traditional storage archive, when somebody goes to the board to say, to say, you know, how am I gonna build this? Okay, you gotta build this again. Here's what it cost by another 20 petabytes of storage. Who are you gonna sell that to? So that's a really tough sell. Today, you know, with greater reduced storage costs, probably something we'll be able to tackle. Okay, so today you've got a single location which helps with performance and you're going to tape. So if you had to recover from tape, it would take a while, but you could do it. Right. Okay. I have a sick feeling on my stomach that when I try to unmute all these lines, I actually muted them. It happens a lot sometimes when I play the mute buttons, people call me up and I say, I was trying to ask a question because we have a lot of people online. If you can't get a question in and if you wanna tweet me at at D. Volante, I'd be happy to ask it or if anybody else whose line is not muted cares to chime in. Now's the time. We've just got a couple of minutes left. What types of hardware are you using there and what you also got yourself back of? Arranger, how are you arranging things inside that arraigned architecture? The hardware is actually accustomed to it yet. So we're running completely custom hardware on the fax. So I've been a product and project manager for sending my own metal and getting all that sort of work done. And so we've done a lot of testing around performance and availability and reliability, trying to keep overhead costs roughly the same as traditional rates. So with the 20 to 25% usable or burn model for RAID data, I'm actually running like 16 plus six in my RAID right now, in some cases. And I have some that'll be as high as 20 plus six. So that's, wow. Yeah, so that you can get a huge amount of availability from 20 to 16, 22, yeah. Exactly. And again, I said like the next set I'm looking to roll out actually be 20 plus six. So that'll be basically the six parity bits what could be more than enough to cover actually even a lot more data than I'm looking to throw at it, but I'm gonna remain on the safe side. And why does the network have become the bottleneck and what do you think the solutions are that? Do you have to go to any city band or something like that for something local? I actually think some of it has to just interrupt on the bus potentially. So I'm not entirely network bound there. Some of it, but I can hit 70, 80% utilization on the network port with a single CPU right now, which obviously theoretically you can get to 100% but so right now I'm probably actually at a decent model for CPU to network I'm pretty close kind of peeking both out around the same time. I think as not necessarily in advance, but basically as 10 get copper it becomes more ubiquitous through the data center which I'd imagine almost anyone doing Greenfield is today. In addition, there's companies like Zego that do if you're going Greenfield they can give you massive boosts in performance and cost for a relatively low compared to doing a traditional top of network style design. They can drop your costs phenomenally and give you 40 gigabit throughput on a single box. So I think that some of the new technologies like Zego which is probably going to be acquired by someone I'd imagine or just a copper 10 gig in the data center which is a fraction of the problems of fiber 10 gig and a fraction of the cost and as production ramps up for those and they become the ubiquitous model in the data center I'd imagine that that'll really be where the network starts to really take off and allow more bandwidth back and forth latency won't change much but you'll have a lot more pipe and then at the same time we'll be seeing increases in CPU over time and those will come down on the price performance curve so hopefully they'll kind of coincide for me at least around the same time. What does this mean for the rocket plan for? Three, that's a compliment. What does this mean for the enterprise? Is that a reasonable question to ask? I think it's absolutely a reasonable question to ask and I think that the short answer is that the enterprise has mostly nothing to worry about. We're helping to set a trend here I think but it's the beginning of a change but people have been declaring the death of all sorts of technologies for a very long time and it's proven to be very adaptable and very much an evolving market and I'm pretty sure that they'll continue to do so. Brocade has purchased other companies and technologies to further them so Hitachi have done the same EMC have done multiple, right? They effectively have Isilon now which is basically a ratio coding and so all of these traditional companies are gonna continue to evolve and change but as always they're also usually, they're not startups, they're not gonna have the first initiator, they're not gonna be taking those first steps, they're gonna be playing catch-up like they usually are and then trying to do their economy as a scale and so that's not gonna be so easy with commodity-style platforms so I think you'll see Dell, you'll see IBM, you'll see these guys that already have a football in the commodity market really be able to take a lot more advantage of it. Was that Tim who asked that question? It was indeed. Hi Tim, how you doing? Thanks for coming on. Okay, we've got just about a minute left if anybody has one last question. If not, we're gonna wrap. This is Alex Williams of SiliconANGLE. I was just curious about the practical integration of thinking from the perspective again of the enterprise customer who's seeing this scaling of storage platforms that they have. So it can be difficult but I don't think it's nearly as impossible as people tend to think of it. Maybe I'm just an eternal optimist but being a storage guy, I sort of doubt it. Like I said before that when you're dealing with metadata and changing metadata mechanisms, effectively your data, whoever your application developer is was already working with the database and maintaining metadata in some way, shape, or form. So in most case you're trying to add an additional copy of this metadata in a new form with potentially a lookup table on Oracle database, as well as potentially a new mechanism by which you push data in or out. You may have been using a completely traditional file system before and just doing block based workloads where with some of, as you scale up larger and larger that's probably not a very good solution at absolutely massive scale. You want something that's much more distributed than having single IO going through a single system at any given time. You wanna be able to effectively use the model I'm using or an easy three model where you can distribute it and yeah, she'll see slightly increased latency but you're gonna have to give up something somewhere for massive scalability in your archive. Okay guys, we have to wrap. Thank you, Alex. So great session. I wanna thank Justin Stalemaira who's the director of storage architecture at Shutterfly and also the SNEA colleagues, Sebastian Zingaro and Chad Tibido, thank you very much for coming on, Sebastian, I'm sorry that we had some trouble with the line. I'd also like to thank David Floyer, Tim Stammers and Alex Williams for your questions and your participation today. And my colleague and partner John MacArthur and Jeff Kelly who's manning the cameras and the tricaster, thank you very much. Just as a reminder, right after this meeting we go into the after call and we develop themes for six research notes that we'll have posted within 24 hours on the Wikibon site. We'll be consolidating those into a newsletter so look for those, feel free to hit the edit key and improve the pieces or write your own or write a wiki tip. We appreciate the contributions from our community and we thank you very much for listening today. So look for that information. Thanks for coming on and we'll see you next time. Bye for now.