 Okay I think we can get this started My name is a Nikola Tranget. I work for for Skeletee This is Bjorn. I'm the open stack team we've been working a bit on a manila driver and Today we will present what Skeletee did in in the scope of open stack how we integrated open stack But first we'll start with an overview of how we could perceive open stack and the storage components within it And a very short introduction in Skeletee and our ring product how it works the high-level architecture So within open stack the storage landscape You have your open stack cloud, which is running somewhere can be on-premise or it can be by some public cloud provider And within open stack you have four. There's only three over there, but you have four different Storage services right now. There is Cinder, which is responsible for for block storage Creating block volumes and exposing them to to your virtual machines. We have Swift, which is an object storage implementation We have very recently manila, which allows you to provision shared file systems and also expose those through NFS or SMB to again your virtual machines and Then finally we have Glance which allows you to store virtual machine images and then allows Nova to retrieve those images and boot instances from it Now most of these things are running on on very standard hardware. There's no real Connection from any open stack components to a very specific hardware platform Given this you can run into sort of silos within your deployment You'll need a SAN for your block devices You'll need a NAS for manila and allowing it to expose shared file systems, and then you'll need this Swift cluster to basically implement Swift now the View of scalability on this is given our product called scalability ring, which is a software refined storage product We can allow you to basically run any open stack storage need On top of a single storage layer, so again in this picture you have the four different components We integrated a driver or a back end for every single one of them and this allows you to You scale it to your ring to store the data that those components need And in turn use well benefit from what ring provides when it comes to resilience of data So we do replication erasure coding We have geo support. We have a very quickly self healing system We know how to manage hundreds or even thousands of nodes installation and all of this running on on standard x86 servers So a short overview of scale at your ring The ring product itself is not a hardware product. It's fully software defined. It's a user space solution So we don't have any specific hardware requirements requirements We don't require any costly or both in time as well as in in money validation cycles and We support we easily support heterogeneous hardware clusters even across major generations So we have customers who install the product years and years ago and they just keep extending it and extending it But they don't need to buy in 2016 the same kind of server they bought in say 2012, which would be impossible So you can just keep growing the system with me totally new hardware The internals of the system are fully distributed. It's some kind of peer-to-peer architecture, so we don't have any Centralized cluster map like some other solutions do require and we don't need to keep the history of this map, which is also a Drawback of several other storage products. We don't need any centralized coordination at all notes will just Connect to each other and and make the right decisions And as such there is no single point of failure, so we can have service crashing we can have Rex crashing We can have this crash crashing of course Now despite all of well, thanks to all of this the storage the data you put in the storage system is Remains very durable and resilient. So we have some production proven data replication or erasure coding mechanisms Depending on on the kind of data you're trying to store As I said before we are very location aware you can have multiple geographical Regions you can have multiple data centers within the data center We take care of things like lanes Rex and then in yen servers and discs And given all of this the system is really always on even if you upgrade it or you extend the capacity We have systems which have been running for for years in a row without without any any downtime or unavailability of the data This is a very quick overview of how Typical installation would look like although this is obviously a very small one with only six physical storage servers We do require six servers. You can do it on smaller installations, but we never do this in production Now on every single one on every single server. We install what we call a node. We install six of those and then logically these nodes they they They construct some kind of ring and the product name And these 36 storage nodes they they form an address space a key space And Every node is will be responsible for for a part of this key space now This key space is constructed such that look when you put in some data or three copies of data Those are dispersed as much as possible. They don't end up on the same drive That would be terrible or even in the same rack if possible And then these nodes they just run standalone So whenever a node goes down or becomes unavailable or there was a network split then the system will automatically start to reconstruct the data Or basically restore the class of service. That's how we call it Like the number of replicas or the number of data and parity chunks you want when using erasure coding between them So this is again a very quick overview of our IO stack on the very bottom you have your servers which have hard drives in them And then we have some IO demons where every IO demon is responsible for a single hard drive And then on top of that we have the storage nodes, which are the nodes. I talked about before Hard drives are shared across the nodes and Then on top of all of this we have what we call the connectors layer. We have lots of connectors But one of them could be so we do both object storage as well as file storage So we have connectors for object storage through for example CDMI S3 Or a homegrown very efficient HTTP based Protocols so a low-level HTTP based protocol Then we have our own scale out file system, which is a clustered file system sitting on top of this object storage layer And that one can be exposed locally as a fuse file system on your Linux machine We have a built-in NFS server if you need NFS access to your files And directories and then finally reintegrate with a with Samba for Windows access to the file system again And these connectors are fully scale out so you can run as many of them as you want And they are also fully stateless. So if one of them crashes then you can just fail over to one of the other ones so Scale at your ring and open stack. What do we do with open stack? How do you integrate with the different open stack services? First of all a little overview of the use cases of storage within open stack first of all on the very Bottom left you have the local operating system images as you know when an operating system is running There's quite especially during boot. There is quite a high loads of a number of IOPS on that system now That is something you definitely want to do locally Ideally on some SSD or other flash-based storage solution If that's not available a local hard drive will do and but there's no real need to To disperse this storage you can just keep it locally because you're not supposed to be storing any sense Or and they do you have to retain over time on your operating system volume? then we have slightly bigger things which are the The virtual machine images you constructed and want to deploy in your open stack and deployment These are bigger so they go into the gigabytes to Nowadays even hundreds of gigabytes in size and these contain but you're operating system and potentially your application Depending on how you deploy your application Then higher up the stack we will have or local block devices Or because it really doesn't make sense nowadays to put a file system on some block device somewhere if you can just Have a shared file system Where you put static content for some web server? Keep some administrative files, so it could be a documentary positive three Log data you want to store for a longer period of time and then finally and at a very large scales, which is terabytes To most definitely petabytes in scale You want to use object storage because other systems will just not be able to manage it anymore, and this could be video content How do we have there some some medical stuff? Research data So the actual integrations We integrate with glance for your volume image, sorry VM image storage which goes through our REST connector we call this project skillety glance store It's currently not yet upstream that we will work with the glance store community to get it as part of the of their distribution Secondly, we have Swift the object storage solution Where we again use the very same back end which is the most efficient back end when using skillety ring To store your objects and we hook there into the Swift object server because there is no real Official way to extend Swift and this seemed like the the best way to hook in there as such We are fully API compatible with Swift functionally any middle where what you may be running in your Swift proxy server Will still work even though you replace the Swift object server back end from local hard disks to skillety storage We didn't them code some some some extra things to To make it more geo aware given the skillety product being very geo aware and this is fully open source You can retrieve it from our our github repository, which we'll show in the end Then for sender for data volumes We provision sparse files qco2 format of sparse files on our scale out file systems So again, if you want to use a volume from different Compute nodes you just can because the file system is this is distributed and this Driver has been upstream since the grizzly release Now the most recent contribution we've been working on is integration with Manila where we use the NFS and SMB capabilities of our distributed file system This one is also open source. It's to be to be upstreamed, but we need to tune it a bit first before we want to push it so after this After this I let it to be wanted to give a demo of what we're doing. It's all live. So cross fingers everything works Yeah, thank you So yeah, we have a box here with with six storage nodes a little bit like on the slide you showed before and what we call the supervisor that is Aware of all the nodes and you can use it to manage your ring But it's only for the diagnostic and management So it can be totally offline and your storage will continue to work no problem So I'm going to show you the the supervisor UI a little bit quick and Here it is Think you can see it. Yes So it's hard to see the box down there, but essentially we have We have six six servers here each containing six nodes That all have access to the local disks And we actually have two rings So we have one ring for for the raw raw data and one for metadata Oh, if you're storing stuff on our scale of file system that would be extended attributes timestamps and stuff and so forth so that's how it looks and You mentioned before that the key space is Covered by each of the nodes so we can go and look at that Pardon Yeah, so as we are you aware This is really up to you We call it zones So here in this installation we have two servers per zones But those could be in different fire zones in a data center or different data centers It's up to you when you deploy it so in this specific setup We have six servers, but there is always two servers in a single enclosure and on a single power outlet So that's why we combine two servers per zone So we'll take a look quickly at the key space So here we have per okay each server is called Store one through store six and they are each running six nodes and here you can see where Which key space is assigned to it? Each of the nodes that's just a very very very brief overview of the UI Can look here at hardware inventory as well So here we see zone one zone two zones three and the different machines and Then we have scale up volumes as well You can define any number of those so here we have three ones So here who would cinder volumes for instance would land on on such a volume So here I'm going to upload a glance image to To our dev stack that is running here and it stores the glass image They down there on our installation. So let's go to Let's see if I need to re-logging here first Yeah, so we're gonna go to images So here we have the standard dev stack images and we're gonna create a new image and call it a bun two and we need a the NFS common package in order to mount the main lusher, so we prepared an image with that and Here I will say Okay These and I think we're fine with the defaults Create an image. So this is uploading right now and in the supervisor. We have an object count We can see Look at the data ring holding the actual data You can see that right now we have 4031 unique objects and that one should increase once this is uploaded Maybe it's already uploaded actually Okay, we will do some more operations. Oh, yeah, here we see it increased 61 Okay, so I'm gonna start an image story start an instance using this image We call it the scale a team an illa perhaps or them manila. Why not use the uploaded image stored on the ring and Let's take a small one and yeah networking is fine. We need a key Okay, we're all good. Let's launch So now I'm gonna create a one terabyte volume and mount it in the machine that we just launched and this is also the The manila volume is hosted by our ring down here. It's one should be up soon. Yeah So here we can SSH All right, we're cool, so Let's go to shares I have a previous share we can create a new one Call it scality make it a terabyte and Here we can select scality as a share type. It's the only share back end that we have installed and We don't really care about the availability zones. Of course Submitting the form. Oh, that's funny So I'm making it a bit smaller. I think your resource cap is set to 1000 gigabytes Okay, let's try 500 s2 then scality share type Demo effect, of course Should you log in again? I think I'm logged out So that's unfortunate Why don't we Look at this share here This one is all good. Well, I will not do a lively bugging. I think We as we have a dev stack we could see what's going on It's actually fine. It seems like it's the form itself that doesn't want to to be submitted So it's a lot more of a horizon problem You're running a dev stack here. So it's not stable version So what if I submit it differently and FS of course, I'm sorry. I'm very sorry We do have a current share Let's see if we can mount that one instead and this one should already have an access rule that is not this IP So if I try to mount this it should fail with a permission error At least to create a mount point Operation not permitted. So our driver is operating properly. It seems to be a UI problem So let's see if the UI allows us to To actually add the permission here So we manage rules and let's add another rule read write to IP was 10 0 0 4 Okay That seems to be okay. So now we should be able to mount it, right? Okay. Yeah, so I'll actually mounted it let's go in as route Okay, so we have our Hello, Austin. That's something I created earlier during tests. So we can touch another file here. All right So it's mounted now Comes something more interesting Then you're showing that this is Actually there so we're going to launch a ping to one of the storage nodes and Nicola here is going to help me to unplug it But before you do that, I just need to get the IP right here. So take this one. I'm gonna ping 10 dot 0 dot 10 dot 11 and it seems to work. So can you unplug sure number six? Okay, it goes good So we can go into the the supervisor UI It should catch up shortly. Okay, it's still not It's not caught up yet. We can go ahead and launch another instance while waiting for it to catch up so This is in degraded mode Degraded start here. We take our Ubuntu again and we go with a small one Network is fine by default. We add our key. Okay. I hear something happened So Nicola, what's actually going on on the ring right now when you unplug the node? So once we unplug this node The keys it was supposed to manage will start to become managed by the next node in this logical ring, which we showed in the slides before And our rebuilt processes which detect that a certain server is no longer available or a disk is no longer available Or something is wrong in your ring installation We'll ensure that the class of service is restored again. So the number of copies if you use replication Is bumped so an extra copy is made for every disk which is no longer available Or if you use the ratio coding and you lost some data or parity chunks, then those are recreated as well on some other nodes Once the node come once the nodes which we just plugged out comes back then All the data will be reconciliated to that note the data that should have been there from the beginning And everything is back to normal now. It's important to note that even though in the supervisor We still saw this green image in in the beginning even though we plugged out a note Everything will still run. So we don't need the supervisor to be up to date all the time at all The connectors can still access all the other nodes and we'll figure out where to store the data or retrieve the data Yeah, so here I logged in on the the machine that we just launched. I think no, I was lying to you here number five Yes That's better So even though the storage system was in degraded mode We were still able to launch an instance which then fetched again its glance image from the ring That's right So now we're gonna go over to Swift. We're gonna create the container and Upload a movie that is and eventually gonna land on the ring. So yeah, we Will see a demo of our swift integration Make it public Okay, I'm here upload a movie Thank you And we can by the way we can check also that the manila is still still reachable So here we can still list our files even though we are in degraded mode Everything is good. Okay, so let's see if we can start that movie. It's gonna grab this URL here Start it in the player copy pasting here. This is really demo effect, isn't it? Am I doing wrong forgot to slash? I must have forgot the part Sorry about this Should really work. I'll get it Okay Maybe it was a copy paste error. So he this guy should be running somewhere. Yeah, there he is. All right So let's go back to the slides So would you please reconnect or can take a look at the the supervisor UI so now this also takes a while Before the supervisors is aware that it's backups should be a matter of 15 seconds or so if we have time we can take a Demo of singer later, but for now, I think I'll leave the floor to you. Okay. Thank you Bjorn so As a short recap As you could see scala tiering can be integrated with any open stack related storage service It gives you basically unlimited Capacity and performance is really scales up with the numbers of servers and spindles you throw against the system We support any kind of x86 hardware So you have really as you pay agility and you can grow with your business easily This is Except during a demo, of course 100% reliable. We really have proven resilience at petabyte scale deployments I myself I've worked on projects of tens and tens of petabytes and Scalety is definitely dedicated to open stacks future and as such you can benefit from the growth And compatibility of both scalety and open stack that leaves time for some questions Whenever you leave the room, there's more t-shirts over there So any questions go ahead as far as what goes sorry Ha very good one We are Available but it basically depends on which connector you are using because a file system you wanted to be consistent At the object layer the very lowest object layer We are available and we are partitioned and tolerant not necessarily consistent it is currently v3 yes Due to how any vests works Your client will need to Restart the non-committed Data it sent before So it's fairly standard NFS v3 failover. We don't have any magic in the back end Indeed and we are looking into v4 support But as of now in the public product, we don't support v4 good question As Bjorn mentioned before we have a couple of rings defined on that system We have one ring for data storage one ring for metadata storage the metadata ring is a ring which in the end runs on SSDs And that is the Mostly used installation architecture. You only need this when you want to use the file system Unless you want to have very very false object storage and you can scale up in SSDs as well But you don't need an SSD layer if you are only doing sorry you do but not as not as a ring So the the capacity in SSD you need is much smaller than the actual data storage it does as I said before for block we currently store a Qco 2 file on the data ring You may want to use an SSD backed data ring, but then you'll lose the The financial benefit We are really large scale. I think I'll let you talk to our product manager I honestly don't know so product manager Paul is over there. Any other questions? Go ahead indeed We do support rename if that answers your question. Okay any others Then we can wrap up here scolety crew will be oh, I think currently that's three Yeah, and that's it. We'll be around Bjorn myself. There's Paul product management then Jerome CEO even came to watch us I will be around we have a boot C9. You can find us on get up And there's a couple of papers on our open stack part of the website and as said before t-shirts. Thank you