 All right welcome everyone thanks everybody for coming my name is Rob daly I'm a systems engineer for Swift stack we're talking to you about optimizing hardware for OpenStack Swift this is my co-worker Eric hi I'm Eric Jolson also works with Swift stack systems in here great so we're gonna talk to you a little bit today about about the overview of Swift kind of make sure that we kind of set the table so that you know I'm sure there are people in the room here that are you know that know a lot about Swift there's some other people here that may not be as familiar with it so we want to take a couple minutes just to make sure we set the the groundwork for that then we want to go into what are some of the choices right what's the decision tree for making choices about about the hardware that you choose for Swift I'm gonna go in Eric's gonna talk to you about a couple of reference architectures that we've used kind of as building blocks for some of the deployments we've done for through our company for Swift stack and then I'm gonna talk a little bit about optimizing the bills what are some of the things you need to look out for what are some of the things that you want to you want to be aware of when you're making those choices talk a little bit about benchmarking and then of course the end I gotta plug Swift stack a little bit I'm gonna talk about Swift stack and tell you guys what what it is we do for Swift and how we make it easier to to deploy and manage all right and we'll follow that up with some Q&A at the end if we have some time we should so Swift overview so so why Swift right why Swift object storage so codename Swift this is the object storage platform for OpenStack and the the key point here is that it's software I think a lot of people out here have kind of heard the term software to find storage kind of ad nauseam right but we do get put in that category sometimes Swift was designed really for unstructured data and they threw away the assumptions of file and block solutions that are made for for certain points right file and block have been out there for 20 30 years some of them right some of these POSIX type file systems that are out there and to be able to make that scale to petabytes and maybe millions and some of our customers have billions of objects that that's not so easy using those traditional methods right so object storage was created to kind of throw those assumptions away and rethink the way that we do that we do things I think collective we can we can agree that scale out has has really won over the scale up type of architecture in some cases that might not be the case but generally speaking the Googles of the world and the the Facebooks of the world have have really figured out that scale out is really the model to go and Swift is is definitely following in those footsteps and blazing a trail for for object so how is Swift used and I mean this is like a crash course in Swift but I want to make sure people understand it so Swift is an HTTP API this is like I said it's not a file mountable file system it's not a one that you that you create it's an HTTP API but don't get caught up in that thinking that you need to be a programmer to be able to access it there's a lot of different tools out there clients applications backup tools fact Swift stack has our own web client that you can use to be able to do drag and drop right in your browser into that API so that's the format that you're looking at up there the format of a typical object is in the form of a URL and you have three really important pieces here is is the accounts the containers and then your objects themselves think of your account as like your namespace within your Swift cluster within that you own containers think of containers is the traditional sense is directories are in like Amazon terminology like buckets really those are just collections of your objects so that that URL is important to remember that's that's really how the determination and placement of objects throughout the system where they land and how they get placed I'll talk about that talked about unstructured data what's great about Swift is that there's zero single points of failure everything is designed in order for it to be a massively scalable system but entirely redundant at all times eventually consistent that's a really important concept for Swift eventually consistent means in traditional terms that's like saying well I have asynchronous replication right in the traditional sense eventual consistency means if I place an object into the system into the Swift cluster what that means is Swift wants to take that object and it wants to create three replicas of it that's the default and you can change that if you need to but if I want to take those three replicas what Swift wants to do is place them in the most unique places as it possibly can now those most unique places might be three discs inside of one chassis inside of one server or that might be a server down here in Atlanta and that might be a server up in New York and maybe one out in LA you design the Swift architecture based on what your failure domains are and Swift is able to route failures around that so you know jokingly we say sometimes we welcome failure we welcome discs that fail because Swift is able to withstand those types of things so what are the most common use cases for Swift so the reason I wanted to bring this up is of course you can guess what a lot of them are but the reason I want to bring it up is just to get you kind of thinking about the use cases because that's always going to drive a lot of the hardware decisions that we make or at least be a major a major factor so image data log files video rendering right anything that again is generally non-structured type data that you need to scale almost infinitely at times all right and now the reason that I bring this up again is these are going to help you make decisions on the hardware choices that you make Eric and I are going to make some we're going to show you some rules of thumb some reference architectures on how you make choices about the hardware that you're going to use but this is definitely going to come into place right what what use cases that you have so talk a little bit about the architecture tiers for Swift so I hope everybody's good that was that was the quick crash course in in house with is working so the Swift architecture tiers allow you to break out a bunch of the functions within Swift okay so Swift has the concept of a proxy layer and proxy is just think of that as a piece of software right a piece of software in the data path as we're accepting new objects into the cluster and that proxy is really the one that's representative or responsible for spreading the data around the cluster in the available capacity that I have so that that proxy software also that's just one layer we also have an accounts layer which is representative and responsible for that account namespace that I talked about earlier the container service that's also another service tracking pieces software and then the object service themselves so those four services the beauty of the the tiering is they can all sit on one box like we have right here right we call that Paco one of the guys here in the front row coined that term Paco would be proxy account container and object acronym what Paco means is that I can place all those services in one in fact some of our smaller architectures that we'll be talking about is they start with the Paco idea the beauty again is that I can start with Paco have all those services on one node but over time if I want to break that out into another set of tiers what I can do is I can break those proxy nodes away from my accounts containers and object services why would I want to do that one reason is you know think maybe like the I don't know maybe like the Hadoop example where I can now disaggregate the compute side from my storage side proxies are going to be handling a lot of requests right they're going to be handling a lot of the API calls for the data that I'm putting in and the data that I'm pulling out of my cluster so that's going to be more compute intensive right that's that's going to be doing a lot more network shuffling and it's going to be handling in some cases SSL termination so you need a little bit more horsepower on the CPU side but now I don't need to have a direct ratio of proxy to objects maybe I have a lot more proxy servers than I have object servers so that's where we start talking about the next the next possible tier and this is probably a more of a medium to large probably the most common use case that we see placing the proxy servers and the accounts and containers accounts and container services really want solid state drives okay the accounts and containers are really metadata about the objects not the objects metadata that's that's kind of a point to make clear it's it's metadata about the objects right who who owns them who how many are there how many objects how much capacity is in each container all that information is stored in the accounts and container services for that type of metadata I need quick access to it and that's that's a very big change from the file system structures that we're used to write file system structures in most cases at least make it a bit difficult for you to break out that metadata from the actual files or the objects themselves right this allows me to tear that as well I can tear out the proxies I can tear out the accounts and containers and the objects I can set them as as three completely separate tears these are for massively scalable architectures okay so let's talk a little bit about some of the hardware choices that are out there so there's always a decision tree right what's number one in everybody's heart right price I got to get the lowest cost for the amount of terabytes that I need to deploy in in my in my cluster and and you know I need to squeeze every penny out of this thing price of course is is always big so we have architectures from some of the the name brand customers we also have a lot of customers that use white box approaches right again the beauty of Swift is that I've got I've got the ability to withstand failure so maybe I can also withstand you know using less reliable hardware not to say one vendor is better than the other but that that's part of the consideration that this architecture allows me to make the reliability this is more to me at least reliability is more of a function of what the Swift software does how many replicas right how many how many times do I want to store those objects how much redundancy do I want to build into the system how many locations am I going to place this is all this data going to be in one data center or am I going to have it replicated out amongst three four or five six different data centers right so it's going to allow you to do that okay performance that's that's a key metric I'm sure for everyone right how large is it going to be what are the networking implications of my performance and what is the load balancing services that I have then finally scale right this is a tough one because you know me and Eric when we interface with customers I know a lot of times we'll say you know how large is this deployment going to be in 12 months from now what's it going to look like in 24 months and most of the time and I'd be in the same spot people shrug the shoulder to say I don't know I don't know I'm growing at a rapid rate and I need to solve this problem so scale is at the very bottom of this list for a good reason so there's some example architectures I'm going to let Eric talk about these a little bit these are some of the reference architectures he helped build and he's going to describe it to you so not to add what Rob was talking about right but there's considerations for when you when you decide how to start out your cluster do you want to do you think you're going to have a cap on your on your data are you going to look today what you look in a year that's going to change all this but I tried to find something in the middle to talk about today so first of all my first advice is don't overdo it don't don't overdo your architecture of the hardware today because as as you get your production data on the system you get production traffic on the system it will all change and that's where another great aspect of Swift you change your architecture as your data pattern change as your volume change as like we saw how how you break out your tears start in a combined know the Paco breakout your proxies if you don't have enough throughput maybe you add more proxies so put together a matrix I think you can actually see it it's big enough up there the first line testing very important create a test cluster try to get this close to production data on it point some applications to it this will help you create a closer to to to good architecture later so what's important in a testing cluster maybe you only have the ability to have one node if you have only one node make sure you have enough hard drives in it to be able to simulate failure to understand how the software works this is your chance if you rip a couple of hard drives out what should you expect from the software what will what will happen how will your applications deal with with the workarounds of Swift I would recommend always have three nodes turn a node off what happens delete the node out of your cluster now what happens this is this is a great learning experience both from the developers consuming the service and from the operators running your Swift cluster so the next step small deployments I considered is in the range maybe 100 200 terabytes of usable storage recommendation for most most everybody Paco nodes will take you far you don't need to break out proxies unless you're in extreme edge case of performance or yes we have the archive use case coming down the middle in the matrix I consider that as right once read maybe if this is truly as an online just a storage for something you might need maybe you would architect a little different medium deployments the big line in the middle maybe petabyte maybe just below a petabyte maybe two three petabytes usable space it's time start break out the proxies it makes it easier to grow your cluster over time you can add a proxy capacity versus storage capacity separately also at this scale start thinking of you're not going to add a node at a time maybe you add a rack at a time maybe you add a group of nodes at a time but have that conversation with yourself and your department to to know where you're heading it'll save you headaches on the road so at these scales we also see archive use cases more so the difference here less proxies an archive use case you might stream data slowly into a large cluster and never really read it again but you need it there just in case save on the proxy tier you don't need you don't need to quick access or quick quick writes into your cluster you can also go much deeper on the nodes so here's where we see nodes with 60 80 90 hard drives per per compute node so very very big nodes and then over on the right is something we're seeing lately it's coming up more and more how do I do high performance how do I get a CDN out of Swift so smaller nodes fewer drives per node faster networking and sometimes here break it out to your three tears to break out separate servers with account container data separate from the object nodes the last line I put up there because what happens if you get to a point where someone says I need 20 petabytes and I'm growing to a hundred within the next 18 month I can't give advice for what that'll look like because at these scales it's going to be very very important to understand what the throughputs are what the use cases are what types of data how it's going to be accessed so we start talking about custom builds and it can save you a lot of money by really really maybe here is where you overdo it right let's see if I put together a small small hundred terabyte usable I want to say if this is truly a production deployment consider if your buddy to afford it nine nodes you can have three zones you have three nodes per zone it'll it'll save you when you have a node failure in the future because you will it'll be easier to recover by spreading your your blast radius for loosing a node become smaller the more nodes you have so in this case Paco nodes again you can get away unless you have a very high performance you're on the high performance spectrum single CPU 64 gigs of RAM 12 hard drives in a 2 you know once again one gig networking most servers comes with four four mix bond them do some creative V landing you get good enough throughput as you're looking at something bigger where you're looking to grow kind of a medium deployment 650 terabyte usable scales linearly sideways so if you need a petabyte at a rack like I said before break out some proxy nodes here I play six proxy nodes across the cluster to parack notice how the CPUs and the proxy nodes are much much faster CPUs here they're two of them and they're actually I think this model is an 8 8 core 2 and a half gigahertz still RAM 64 gigs to hard drives is really us for OS no data goes across those and also 10 10 gig networking as your cluster grows the backend networking needs to grow with it and also has to make the number 650 terabyte usable 15 nodes 36 drives each gives you a good you lose a node it's not the end of the world a 15th of your cluster keep that in mind so if you have a node failure you need to have spare capacity in your cluster don't don't ever fill your cluster 100% I think I think that's all I had thanks Eric yep so let's talk a little bit about optimizing those builds right and and taking some of the principles that the Eric just showed us so I'm gonna break this out and assume this is generally what we see the most is is really a two-tier three-tier setup not necessarily the the Paco that we've been talking about Paco generally is for smaller deployments this is gonna be for larger so I'm gonna start with the proxy and this this is a we've been talking all this time about not using rate or relying on Swift to give us that level of redundancy for each individual drive that is sitting in the system the only place that you may want to you may want to use rate is on your operating system and we actually have customers that and deployments where they've chosen to just go with one disk on their proxy or one disc for their operating system for for their object server and the reason being is when you start to scale to hundreds if not thousands of nodes the loss of one of them is one thousandth of a percent of your entire cluster maybe it's not worth buying a second hard disk hard drive for the operating system times a thousand of those systems and when you think about the cost of it so if you're able to withstand the outage of one and since everyone isn't treated as an equal citizen within Swift be it the proxy layer or any other layers everyone's an equal citizen you may not you may not need that level of redundancy start with a minimum of two proxies I think that kind of goes without saying right we go through all this trouble for no single points of failure the proxy is is at least a minimum of two you're probably even gonna want to do more than that and again you can add these kind of ad hoc when when you need when you need more bandwidth or you need more more processing power up front more CPU power that goes without saying right we want more CPU power within our proxy nodes and give you a little bit of mass on that that's the next the next point I just wanted to say that the proxy nodes themselves are not really doing any disk I oh right what they're doing is they're really shuffling all the data that's coming in through the HTTP API requests and they're taking that data and then they're spreading it down to the available stores behind it so there's really no disk I oh it's not something that you need to take into consideration this is kind of again rule of thumb and this this doesn't have to be an exact science and I see all the cameras going up for it this is kind of a rule of thumb for us and how we figure out you know how much CPU we need based on the amount of storage and again your use case your use case may vary right but to use a couple of examples figure out how much CPU power we need on the proxy side based on the disks that we have on the back end and you'll see some of this in a lot greater depth I'll give a quick plug to the book that our team wrote that is available at our at our booth on Swift and you'll see a lot of these principles in there in a lot greater depth but to kind of summarize how we would do some of the math take take a one to two CPU proxy a CPU power to drive count when we're talking about a low concurrency maybe a few hundred concurrent connections into the system kind of come with the math to figure out what the overall processing power is that I have on my proxies and then apply that to how many disks on the back end and we'll show you the an example that that reflects that excuse me and then the second example is a high concurrency one just just a lot more CPU power for a lot more a lot more throughput and a lot more requests that we can handle so the accounts containers and objects I'm gonna just make a general assumption here that those are held together those services are gonna be held together on one one physical box again we don't need any raid we're gonna take care of that Swift the software is going to take care of that so don't worry about putting raid in for your account container SSDs we want SSDs how we want low latency response times for when we're doing those lookups and we're making metadata changes as we upload thousands and hundreds of thousands of objects into the system that metadata needs to reflect that this is a general rule of thumb and I've gotten some feedback from some people internally this may or may not be right but I'm gonna go so from a general standpoint to be able to size how much storage you're gonna need for those for those services for the accounts and container databases it's about one to two percent of your overall used data objects within your cluster and again they could vary but it's a rule of thumb that we like to follow in other words sizing from a capacity standpoint is not going to be a huge deal with the SSDs that you're going to be using it's going to be a relatively small amount of storage so coming up with the RAM on the object nodes and this one which put like a big asterisk on it because really the rule is the more RAM the better right the more RAM that I have the better performance that I'm going to see on the back end but obviously when we start scaling this out RAM is going to be one of the cost factors that's going to help change that decision a little bit so the very minimum the very minimum that we'd like to see is a gigabyte of RAM per hard drive now some of our customers go a further out and if they're using a four terabyte drive they actually go one gigabyte of RAM per terabyte of storage that they have in the system so this is the minimum that would probably be more of a maximum not that there's any any limit really the reason that we want to do this is at the very lowest level every disk drive in the system is formatted as the XFS file system XFS is going to leverage RAM for all of I node lookups within within each one of those disk drives so the more of that that I can cash up front the better the performance in the lookups are going to be when I go to retrieve an object randomly from from somewhere within the cluster then the object node disks themselves so this is going to be dated pretty pretty soon right I'm talking about three and four terabyte sated drives the six terabyte drives are already out there and we're starting to see them we're doing some testing with them internally some of these helium drives but the key thing to take away here and I should have bolded it is we want to use sated drives right or maybe even enterprise sated drives and again I keep harping on the point we want to make sure that we don't overpay for fiber channel drives or or SAS based drives and there's really no need for them throw out the conceptions that you have of you know rebuild times of raid sets that that goes away if we lose a disk in a swift cluster there's no such thing as a rebuild what the cluster is going to do is it's going to rebalance the data that it has on the existing capacity that it has and when you get to finally you know go grab that new four terabyte drive to replace it swift is going to say oh okay I got that capacity back I'll rebalance on top of that so the point is we don't need like these ultra fast low latent drives to store your objects because really swift swift kind of works in a much different different fashion than than traditional raid set so don't think about rebuild times anymore that's the thing of the past I should actually just to give you a little bit of the the math at the end I think most people got pictures of it but just so I touched on it the desired capacity that you have within your cluster just to figure out what some of the the actual raw numbers are going to be when you convert base to base 10 numbers what the what the numbers are going to be the size of the drives and how many bays are going to be in your chassis and as Eric was talking about underfill those chassis when you first start and the reason that I'm saying that is don't populate a 36 bay dry 36 bay chassis and now your next step to grow is to have to go and buy a whole another 36 bay chassis and fill that whole sucker up right it might be a little bit easier to underfill it but you know scale out a little bit wider and leave five to ten of those of those slots open just at the very beginning as you're beginning to learn what your cluster does what the performance characteristics are going to be and you know operationally what's it going to be like to have to add in capacity into that into that cluster networking networking is a huge piece of Swift there are three there are three different network interfaces that Swift uses the outward facing one is quite obviously the one that's handling all the requests that come in bound we always recommend the ease 10 gigabit in almost every iteration of our deployments and those requests are at I'm sorry I'm sorry I thought I heard a question those requests are actually going to come from a load balancing solution and that's the next slide that I'll talk about the cluster facing interface is the second one that's actually going to be the one that takes that data that already came in and disperses it out amongst the storage so when you think about it if I have a right request that comes in for a one gigabyte file or one gigabyte object that object is going to get come in at one gig but it needs to stream out at 3 gig right so think of your cluster facing interface as in you know in all intents and purposes needs to be 3x the capacity of what your your outward facing interface is going to be and if you have four replicas the same principle would would apply it's now going to need to be 4x and etc traffic is not encrypted and that's that's something to take into account okay I've worked with some customers that that's that's a necessity that's something that out of band we need to come with some some encryption strategies for how we do that and the same applies for the replication network which is the third and final network the reason that we have all these networks really is for flexibility right it gives us different choices for how we want to architect Swift the replication network is almost an automatic when you have multi-site in other words multiple region clusters right clusters that are in in different geographies all right this this is going to be leveraged since I already explained to you how data comes in and then is dispersed out the replication network is really going to be used when there are changes made to my my cluster right so if I lost capacity but added in new capacity maybe I added in a whole new data center that's where the replication network gets used so sometimes at least starting out what our customers will do is these don't need to be dedicated interfaces they can bond them together right I could bond together two or three or four ten gig interfaces maybe just for redundancy purposes but also just to put you know vips on those that each one of these services can leverage all right so the point is they don't need to be dedicated and then we'll talk about load balancing and make sure I'm cognizant of time here so the load balancing service is something that's really handled out of band from Swift so this is really another service that we have that we leveraged from a software perspective or hardware perspective so we can do SSL termination at the load balancing level or we can do SSL termination down at the proxy level it's going to depend on where you want that that kind of extra workload that extra weight to be handled do you want it load balanced or do you want it down at each proxy it's all dependent what are some of the hardware choices these are just a list of them I'm not picking favorites by any means these are just some that we see out there and there are some software based ones as well I know we use HAProxy in a couple of instances LVS Linux virtual server that's actually something Swift stack packages with with our our offering is as a load balancing service it's for smaller sized implementations the VRRP based protocol and then some people don't even use those maybe they'll use round Robin DNS for kind of handling the inbound request that is get spread out to the proxies so coming to a close here I want to make sure that we at least touch on benchmarking a little bit so there are a couple of different tools out there and again I'm not I don't mean to be picking favorites by any means but these are some that we see pretty regularly SS bench is something that was designed by Swift stack and is maintained in the community Swift bench is another very popular one and cost bench as well I generally send tend to see SS bench more often than not so that's what I'm going to kind of show you an example of at least some summation of what some of that data looks like so this is from an existing customer that that one of my coworkers did some of the work on here so this is just you know test cluster to be able to see kind of what the characteristics would be of puts and gets right what the the rights and the reads would look like so three nodes you'll actually see up here these are actually the benchmarking servers themselves and then we've got the nine nine nodes and three proxies each alright so there's a there's a typo here there should be there should be filled in all the way so apologize but you can kind of see the the output that you get here is generally speaking if I've got if I'm doing get requests or I'm doing a bunch of reads and across that hardware that I that I chose to use I'm generally going to be bound from a CPU perspective right this is not this is not a lot of bandwidth this is a lot of requests coming in and out of the system so the way that this works is I've got 50 workers that the system set up we set it at 10,000 user concurrences how fast can I read can I get 800,000 requests right and what it wound up being is it about 16,000 random 1k requests per second against those 315 drives and again those are random so keep keep that in mind and you can see as you kind of go down and the objects get larger as I'm placing them in right I no longer become bound at this at the CPU layer I'm really and to be honest with you this is probably not the greatest example but I'm really bound network wise up here because the two nodes that I showed you earlier they're actually the benchmark that the workers those are actually the ones that are bandwidth constrained because they can't push enough data out to the 315 disks quickly enough but you can see that it's kind of a it's kind of a scale and a balance is it the proxy tier that I might I need to boot boost a little bit and beef up or is it the object side of things that I need to to boost up beyond the the get requests also have some some put requests against the same the same thing one thing to take into account here this is really representative of what is the user seeing right this is this is user information so the user sees 6000 some odd requests second but we know as administrators on the back end that we multiply that times three right because we have three replicas happen so it's it's really when you think about it the reason the rights are relatively close to to one another in terms of performance because that that number would generally be about about the same as the read requests were so I got I got a couple minutes left here I just want to real quickly talk about Swift stack and what our company does and some of the software that we've created so Swift stack is created a web based management infrastructure that helps manage your Swift deployments from really from cradle to grave right you can you can deploy within minutes a Swift stack Swift deployment you can also manage it from the cloud we have a couple of different options for how you can manage that you can manage your entire storage infrastructure from the cloud and how do you do that our infrastructure is entirely out of band our management piece is out of band meaning no data actually comes through it it's solely for the purpose of managing your information right the Swift nodes themselves that are on your premise that are local to your business that's where the data flows in and out of we can't see that data we can't pull that data all we're getting is metadata about that we open a open VPN tunnel between our controller we call it the controller and your nodes so that we can manage things like what are performance expectations what are capacity fill rates hey are there alerts is did something go down that I lose a disk did something something go wrong with the cluster that I need to rebalance the existing capacity so all these things that again are outlined in in the book and I'll show you the book here you can go get a copy of the book hopefully we have some left you get a copy the book over at the at our booth but a lot of this is detailed in in great detail it's put in great detail for how to scale these things out if you were just building your own Swift cluster without Swift stack then we show you okay after these couple hundred pages that you've read and the difficulties you can imagine trying to put this in operationally how much easier this is with the management portal with people that have been doing this for three four or five years that have really perfected how to manage this infrastructure last plug because I know my marketing manager is over here and I want to get it right so we've got a we've got a party tomorrow night guys at 8 30 at the Tabernacle it should be a lot of fun we're all going to be there come talk Swift to us come ask me questions ask Eric questions we're happy to to engage with you and that's it you have a couple minutes left I know I ran kind of late but are there any questions that I can answer for anybody sure so you're asked so the question is how many what's kind of the breaking point for when I want to use SSDs for accounts containers I'm going to give a rule of thumb and hopefully I don't I don't botch this up but I think the rule of thumb is generally about a million containers within an account and then a million accounts that's generally kind of the breaking point that we like to give the object number is less of a hard number that I can give because it's going to vary based on on what your object size is maybe we could talk a little bit more about that oh yeah sure I'll get yeah that's actually that's a great question so no promises here but one of the things that we've been engaged with that that we're kind of taking seriously is we've been asked about in Finneban connectivity and the architecture is such that you could place an infinite band an infinite band infrastructure in band with this type of architecture right I can I can use Swift with something like that as well as 40 gig infrastructure the point is anything that Linux Ubuntu Red Hat and CentOS are going to support we're going to be able to support because Swift lays on top of that as a software layer which is that that's the beauty of the system I know you had a question I think we got time for one last question so I want to make sure that I got it right the ratio between metadata information to object data yeah the data I mean account and account information and container information is one to two percent generally speaking that's the capacity that it's going to take on those SSDs so if I've got a you know a petabyte of object storage then I'm gonna have you know roughly a terabyte maybe two terabyte of SSD capacity need that I'm gonna have but remember that's gonna be spread across a whole bunch of nodes all right thank you everybody I appreciate it thanks for coming thank you