 Hi, good morning. Well, good afternoon now so My name is JC Lopez and I work with the global storage consulting practice within red app and What is particularly in our group is that we not only do consulting, but we also do training So I'm actually training Ceph related classes for red app and I travel, you know every single place and just like go and teach about Ceph Specifically read access storage, but more in general everything that is related to Ceph and Just happened that I was asked to run this event today while you're having some good food I hope and If you see me starving and just like rolling myself on the ground, that's probably because I'm hungry. Okay so the idea of this presentation is to focus on on Highlighting how we can make Open stack integrated with Ceph so that we can use Ceph in an open stack environment and you will see how easy it is and Given what you can do with Ceph and the flexibility of Ceph that we'll review and all different access methods that we have with Ceph Why it definitely makes Ceph? Really the best fits when it comes down to storage for your open stack environment So in order to get started we'll just like review a couple just like two slides about Storage concepts that we will be using today. We'll be discussing during this session First of all is the different type of storage. We have In IT environments, so we do have block storage. So block storage has been around As you can see my gray here, right? I started in IT quite some time ago Actually started working on mainframes and at that time we already had block storage So the idea is that you show a device to your server or I mean system and you use that Device so that you can use it accessing the different blocks on that particular device So most of you do know that on a block device what we call a disk drive. Actually, we have sectors So what we do when we say block device is that yes indeed We exit blocks, but the blocks we access are actually the sectors on the disk drive So to explain you and to tell you how common block storage is is very simple I see a lot of laptops, you know, you're in there in the room and you all know that your laptop actually boots From a block device and on top of that block device. What do we put? We put what we have here, which is the second type of access method we have which is a file system So file system is another way where you can actually access the storage But instead of referencing a particular sector on the device itself You reference either a directory or a file and you access the file and you read and you write you open the file You close the file and you do whatever you want creating directories Removing directories in creating files. So those two types of storage have been around for quite some time Basically as far as I can remember when I started over 30 years ago. That's exactly what we were doing 30 years ago Now the last player we have over there, which is blocks object storage is actually a type of stories that appeared later So the concept was known and became known actually when object oriented programming came into the IT industry because Object oriented programming offered one particular functionality that was called serialization And it was the ability for an object that you were creating in your program to Self save itself and to self reload the self So the idea of object storage was to port this feature and to make it available as a global storage So when you access object storage rather than referencing a directory name or a file name What do you do you reference the idea of an object that you want to either load or that you want to write it? And that's the three different types of storage. We have in IT storage today and Tomorrow we'll probably have some more. Okay, because we keep reinventing ourselves and making things better Now the second concept slide that I show you here is the different type of protection that we can offer in software-defined storage So most of you if you're familiar with storage, you've all heard about ray technologies Okay, right zero red one rate five rate six Okay, so the two ways that we have to protect data within seph is either to do what we call Replicated storage which is basically to protect the data We're gonna make full copies of every single byte of data that you store in your seph storage cluster So the advantage of replicated protection is that it provides a very high durability Because you can actually select the number of copies you want to protect the data So that you can select the number that is high enough so that you can survive a single failure and even a double failure Advantage of replicated is that it provides you with a very quick recovery Because you have full copies of the object The only thing you need if something fails is to reuse one of the existing surviving copies to recreate a new copy and you're back in business Advantage of replicated is that it's performance optimized the object exists as a whole is not just like chunked It's not encoded nothing happens to it. That's the best performance you can get is through replicated storage Now the other type of protection we have here is the actual erasure coding a razor coding is a way so that you can protect the data But is going to be capacity optimized So we're going to do exactly what we do with right So we're going to store the data and we're going to calculate some parity so that if something comes missing We will be able to rebuild the missing data using the surviving data including the parities that we calculated So it's really an equivalent to what we are doing with right five and right six Now let's have a look at Seth's and red access storage specifically and the way it's actually architecture So that's a very nice diagram that has different layers So we have at the top what we call the access methods and we'll explain the different ones we have We have an API called known as liberados that will discover But the most interesting part that we're going to start from is the actual bottom layer Which is rados rados is the actual storage back in for the safe cluster and where we're going to be storing every single byte of data that we want to store in our safe cluster So the access method we have one known as our GW and we'll see that we can serve a three and swift request from the rados Get way we have our BD so that we can provide block devices on top of our safe cluster And we have self FS which is a distributed file system politics compliant So as you can see and just because I want to start with that and there's one limitation That's the last there is that we have here at the top that with although within the safe community Seth FS is available the code is in place so you can use it When you use red ass safe storage, which is the downstream project based on Seth from red app We consider Seth FS as a technology preview in red ass safe storage version 2.0 and later in the near future It will become production ready So so far we recommend and we encourage our customers to use FFS so that we can Collect data about potential both bugs that still remain in the product But we do not support CFS in a production environment for our customers that use red at safe storage So the reliable autonomous distributed object store is an object store So what is actually surprising is that on top of an object store? We're gonna do object storage if we want So the idea is that this is the core back end of Seth and every single Byte that we're gonna store in Seth Will be stored as an object into this object store and the idea is that we're gonna find and we're going to show You how we actually distribute the data in this object store so that we can leverage the capacity that we have With all of the disk drives that we're gonna have in our safe cluster So it's a software based component. So Seth is software defined storage So it's available, you know from as far as red at safe storage is concerned We support customers using Ubuntu platforms and red app platforms so on rail platforms And if you are using Seth the community bits, well, you have even more ports. It's available from For different distros so you can actually run Seth not only on rail not only on Ubuntu, but also on other platforms Now when it comes down to rados this object store So the idea of the object store is that we're gonna aggregate a collection of Disk volumes that we have in our environment So we're gonna have these little squares here And that's going to represent what we call object storage devices that will cover cover app after and Every single object storage device is actually using a disk drive and on this particular disk drive Depending on what happens and how we place the data will be storing the data when we have a right But when we want to do a read IO operation We will potentially come back to any of those disk drives object storage devices So that we can actually read the data and serve the IO request to the client So that's what we have in the back. And as you can see So we just have like basically two lines We'll explain about the circles in a very short moment. So basically what do you? What do you do? Seth is a typically a scale out model So when you want to grow Because you want more capacity you have more disk drives, but you can also scale out if you need more performance so depending on what you are trying to achieve you can Even blend the type of disk drives that you can actually integrate inside your self cluster So you can have a mix some data being stored on set of drives Other data being stored on SAS drives and other data being stored on SSD or PCIe and VRAMs You can actually choose whatever you want and we'll see that we can actually partition our cluster So that we can select what type of data goes on one type of storage So the object storage devices Also known as the object storage demons are is the pieces software that actually the Is actually the intelligent part within the self cluster so everyone? You know says why intelligent because they are the one that actually serve the IO request So we'll see that we have a second type of demand demon that is just here to actually capture the current state of the cluster So the work horses of a self cluster all the OSDs So an OSD so far You can scale as many as you want so The bare minimum if you want to build a self cluster if it's just to play because you want to just like see how it works You can actually build and I you know I've done so many times just for just for the fun You can actually build a self cluster just for playing of a production obviously with a single disk drive You can actually do that now in the production environment the bare minimum you're going to need by default We create three copies of each object to protect the data So basically the minimum number of disk drive that you need for a self cluster is actually three now You can actually scale and you can mix the type of drives You want and you can go up into the thousands of disk drive that you're going to aggregate inside your self cluster now the monitors are the famous circle that we saw in the rados picture into the diagram so the monitors are here so that they can actually maintain cluster membership and Maintain the map that will contain the list of all the elements present in the cluster and their actual status So the monitors will maintain different maps. So there's a map for the monitors. There's a map for the OSDs there's a map for the MDS is and Every single element that we have in those maps carry the characteristics including the status of every single component So for every single OSD running in the cluster It will say this particular OSD is up this particular OSD is down And we'll have that in the map so that at any point in time You know, we know exactly who's running who's not running who is taking part in the cluster to have data And who's not taking part in the cluster to actually have data So how many do we need the bare minimum for a production cluster e3? Why because of the way we configure the monitors so that we can validate every single map update in the cluster? So in order for an events to be validated and an update to be applied to the maps We need a majority of the monitors to validate the event So if one demon stops OSD stops We need at least two monitors more than 50% of the of the months to actually validate the event And that's why we want another number so that we don't end up Okay, with a split brain scenario. So if we had to yes, indeed We would have no single point of failure, but because of the way we have configured the algorithm to one stops Yeah, one still running but one out of two is 50% exactly and that's not more than 50% and That's why we need a minimum of three in a production cluster Now usually what we say and what some of the customers most of the time ask when they start and they deploy the cluster say Can we deploy more months than just three? So yes, you can every single component is safe is scalable So just as the number of OSDs is scalable the actual number of months is actually scalable What we usually recommend is that if you know that your cluster is going to grow to a very large number Above 300 or 400 these rise we highly recommend to deploy at least five months Okay, if your cluster will have no more than 300 OSDs three is just like good enough There's no problem and the way we dispatch the months is that we dispatch the months We deploy the months on the failure domains that you're going to be running in your cluster So if your failure domain to host every single copy of an object is in a different rack We recommend that you deploy one monitor per rack if you just have servers because you don't have racks We recommend that you just deploy the monitors on different servers only So where does an object live in this object store? So we have rados at the bottom So we have the application at the top So the whole idea we said that the OSDs are serving the I request so fair enough But where do we put this data? So we need to find a way so that we can actually balance and use the entire data space available in our self cluster Remember we can scale and up to more than a thousand this drive in the cluster So we want to make sure that we make good usage of all this space And we try to dispatch the data so that we can have good performance by dispatching the I requested the different These devices but also use the entire space that is available on all those this drives So the way it works is that in a self cluster you have what we call pools and those pools are Logical partitions inside your self cluster. That's what you're going to use so that you can actually grant permissions to the users trying to access the data in the cluster but also we're going to be able to assign to each pool a type of protection and a placement strategy So we will see that we have one algorithm that drives the placement of the data in the cluster and this algorithm is known as crush CR us h and will explain what the algorithm the acronym means Okay, and you're gonna be able to assign to every single pool one and only one placement strategy So you can say these logical partition will have its data on cell drives these particular partition will have its data on SSDs and so on So every time you access data in the cluster So you will be referencing one object and that object will be in one partition So in one pool and that's how you access the data so you always reference one particular object in one particular pool and based on the protection We will go and look at the data on a particular type of OSD that uses that Particular type of drive and we'll also check before we access the data that the particular user that is Requesting access to the object does have permission to actually access this particular pool Now the object is One thing but how can we just like get a better dispatch of all the objects inside the cluster and in order to do that We have crush and crush stands for control replication and the scalable hashing. That's the name So every single pool we have in a cluster is gonna be divided into sections and Every single section will be known as a placement group So when you're gonna be storing an object or reading an object from the pool We're going to determine to which placement group this particular object comes from so an object can belong to one and only one placement group and You can configure the number of placement groups that you want for a particular pool So the more placement groups you will have The more you will disperse the sections across the object storage Demons the lesser the number of placement groups The less you will dispatch the objects amongst your object storage devices The example I always give is that imagine that you have a safe cluster with 300 OSDs So 300 this drives you create one pool and only one pool in your cluster and say I want to create that pool With one and only one placement group What would happen? Because an object must belong to one and only one placement groups All of the objects would belong to that single placement group and a placement group leaves on an OSD Which means that in that particular case? It would be very very unwise to do so because even though you have a certain OSDs because you have a single PG in your entire cluster you would be using a single disk drive Okay, so the idea is that we have the placement groups so that we can assign and Distribute the placement groups on all the OSD so that we can split the load and The space on all the disk drive that we have in the cluster So crush is a quick calculation So to determine to which placement group an object belongs we actually make a modular operation we take the name of the object and We make a modular operation Against the number of placement groups we have in the pool so modular operations will give us one and only one result out of the Many PG's that we have in the pool So we will be able to determine which one and only one placement group in that pool will actually host this particular object In the second time and the second run what will happen is that we will have to find Where these particular placement group is actually hosted? so first we do the modular operation on the object name and Then we will call a special function known as the crush function and that crush function will tell us these particular placement group is on these particular OSD and then the application will go and talk to that particular OSD to either write an object or read an object This way we'll distribute the load across all the OSD's and we will consume the space on every single object storage device in the cluster So crush is known as a pseudo random placement algorithm pseudo random Y Because we have a randomized portion, which is the modular operation and the dispatch of the placement groups on top of the OSD's but what is interesting with crush is that The placement of the placement groups on the OSD's is linked to the actual state of the cluster So for one particular given cluster state, imagine that you have a cluster with the southern OSD's they're all running, okay? You stop one OSD that's going to change the state of the cluster because one OSD stopped What are we going to do? We're going to reassign Some of the PG's that were on that OSD that you just up to other OSD's in the cluster Now imagine that an hour later. You actually restart this particular OSD What is the state of the cluster after the restart of the OSD? The exact same one as before you stopped it So what are we going to do? We're going to pull the PG back exactly where it was on the original OSD So that's why we say pseudo random because there's a randomized portion But the placement of the PG's is the same for a given cluster state and they always work the same so Fast calculation the calculation is performed on the client side not on the OSD side So the client using his own CPU and his own RAM and his own local resources Will make the calculation for the modular operation and we'll call the crush function to actually find the placement of that particular PG on the OSD's So a crush was designed to provide a statistically uniform distribution So a lot of people get puzzled when they start deploying the self-cluster because they see an uneven space usage across the OSD's why because crush was never designed so far To provide an even distribution, which is you know different Now there is big work in the safe community through reddit But also everyone participating to the community so that we can make crush better to get a more Even distribution of the data in the cluster and it's a word that is actually taking place is getting refined Every with every single version of crush so that we always try to make it better So that we have a better and the more even distribution of the data when it comes down to space So the mapping is stable. Remember what we said The mapping of the placement groups is always the same for a given cluster state So when you stop and start OSD's whatever you want Whatever you do as long as you don't change the number of placement group The mapping of the placement group will always be the same So you can alter crush and you can modify the crush configuration on the fly So crush is infrastructure aware So the idea is that we are going to be able to say I want for example to place every single copy of the data In a different rack on a different server in a different role in a different IT room So you can actually choose the way you want to configure crush so that you can select where the data goes And where each copy of the object actually goes inside your self cluster So you can adjust every single parameter within crush so the number of copies you want The minimum number of copies that must be available so that we can survey your request and you can influence the placement of the placement groups By adjusting what we call the crush weight So every single element in the crush definitions will have a weight the higher the weight The more placement groups we will give to one particular element the lesser the weight the lesser the number of PG's We will give to someone to manage So ultimately and that's a test that every single person that deploy stuff in the beginning tries What happens if you set the weight to zero for a particular element? What crush will remove all of the placement groups that were given to? protect to that particular object storage device and as soon as you increase the weight again Greater than zero then crush starts assigning placement group to that particular object storage device now Crush belongs to Rados to the object store. That's the placement strategy the placement algorithm we use and The first thing that comes on top of this object store is a native API So this native API is known as Librados So we have different wrappers around Librados so that you can actually Insert calls directly into your application so that you can actually store and retrieve data outside To or from a rados cluster So we do have some customers at redact that use that particular feature and They embed the calls to Librados directly into their own application So that they don't deploy any of those fancy access methods on the top the rados get way to do a three or block Devices or Ceph FS they make the calls directly from their application Now when you have an application using Librados inside your application you make a call So you have to open a connection to the Ceph cluster You have to open what we call a Ceph context Which is tell the cluster tell the API with what logical partition you're gonna work The pool name a the pool name be the pool name see and then you can make your call so that you can do Get requests to obtain an object retrieve an object from the cluster or you can do a put request so that you can actually write something into the cluster and From your server where the application is running Through the Librados library shared library We will go and access the cluster so that we can do whatever the application as requested So this is a native protocol Has nothing to do with a Three and swift protocols. It's what we call the native Ceph protocol So Ceph by the way when we come down to protocols is entirely TCP base We have zero UDP in a Ceph in the Ceph software. So everything is using TCP Now the access method what we call the higher level access method The first one that is available out of a Ceph cluster is the rados get way So the rados get way is a piece of software that we deploy on top of the Ceph cluster So the rados get way is actually a client of the Ceph cluster So your application that uses a three or uses swift Will make the regular swift calls and regular a three calls Every and switch our restful protocols They will arrive over the network to the rados get way and the rados get way will interpret that request to do whatever it needs to do and Will store and retrieve the data out of the Ceph cluster So the way you deploy the get way is on top of the cluster the rados get way is not part of the cluster It's exactly what your application in the first case where we explain an application using lead rados That's the exact same case the rados get way is an application that runs and Uses the Ceph cluster to store and retrieve data In to and from the Ceph cluster So remember we said There is Ceph is a scale-up model So the rados get way also falls into that scale-up model There is no limit on the number of rados get ways you can deploy So you deploy as many gateways as you want so either for performance reasons Because you need more performance or more and more requests are gonna come in you know through a three or swift So you want to be able to handle those requests Advantage of the a three and swift protocol their restful protocol So you can actually put your gateways behind HTTP or balancers because a three and swift use HDD HTTP protocol to actually communicate and You can also deploy multiple gateways because you want to be able to store different data in different pools So you can actually deploy a set of get ways that store data on a particular pool or store and deploy another set of get ways That store data in another pool What is cool with the get way is that you can when you become you know good enough and just like you are used enough to The configuration of the rados get way you can actually have a single set of rados get way Store data in different type of pools. So some we we use what we call placement strategies inside the rados get ways and You can actually have a single get way store data in one pool that uses data and Store data in another pool that uses SSDs So this makes the get way highly scalable So a lot of people say how scalable it is when it comes down to performance Well, we have numerous cases that we've been Running with some of our customers where we held them to design their infrastructure And we were able to achieve way way over two gigabytes per second using the rados get way and the F3 protocol so that gives you an idea of the kind of bandwidth that you can generate and That particular case that I'm thinking of was only with four rados get ways four rados get ways on a cluster Just like over, you know two gigabytes per second Now the second access method we have is RBD so RBD is the ability we have so that we can create on top of the SEF cluster block devices So remember the gateway S3 and Swift so Swift talks to you when you're open stack already and RBD block devices You just like basically see where we're going we're going on to the way so that we can actually support something like Cinder with the block devices So the idea of the block devices is that you can create as many block devices as you want in your SEF cluster advantage of the RBD feature Every single block device can have its own characteristics Why because a block device first leaves in a pool so remember that a pool can use different type of devices so you can have Block devices that will be using fast devices and other block devices that will use capacity block devices such as set of drives 8 terabytes or 10 terabytes The second thing we have is that We can also remember everything is stored as an object in SEF We can actually for each RBD select the size of the object We want to create in the rados object store What do we want to be able to do that for performance reasons also? Because the bigger the objects we have That's very well suited for Bend with intensive Applications, but when you have IO intensive applications So smaller IO requests then you want to be able to adjust the size of each object So that you can actually dispatch more IO requests small IO requests to more different these devices So in a virtual environment will have the VM the video the VM will be shown a particular block device and Through the iProvisor, which is like go and access through LibRBD Which is LibRBD is the library that we used to access a block device in a virtual environment and The the RBD the particular block device we have being accessed from the VMs will be broken down Into multiple objects so that depending on the sector or the block you access on the device We'll go and talk to these particular device or this one or this one or this one or this one This way when the VM just like throws all of the IO requests We dispatch the IO requests because every single object Remember is assigned to one PG and the PG is assigned to an OSD So this will distribute the IO load across all the block device all the disk devices we have in our self cluster Now the advantage of using RBD innovation environment is that You can actually decouple completely the compute power from the storage So on your nova compute nodes in the open stack, you know environment You need nothing the only thing you need as a storage on the compute node is the ability to boot your compute node Which is basically a boot device all of the storage from Used by the VMs will actually be in the self cluster and all of them will be doing the same thing The VM will be shown a device the VM will access the device and through the iProvisor We'll go on talk to some OSDs out there in our self cluster Another cool feature we have is a kernel module So you can actually use on any regular server that does have this module available And that's the case of rail, but also plenty of other different distributions We have the ability to access a block device directly from the operating system so using the RBD command you will be able to Show and create a local device in slash dev So that on that particular server you can actually do whatever you want with the device What you will typically do is that you will format the device so that you create a file system on top of it So that you can actually use the regular You know CD commands and whatever commands that you want to use on your operating system So that you can access the file system that is sitting on top of an RBD device That is actually in the self cluster. So on your Linux host through a module that we call KRBD So we call it KRBD just to make the difference between RBD being the name of the feature and KRBD being the RBD kernel module inside the Linux distro now the last access method we have CFFS is the ability to create a Distributed file system shared file system politics compliant so that you can actually have multiple users talking to a single file system so typically Kind of what you would do in a NAS environment for those of you are familiar with the storage acronyms RBD would typically be in the sand space while CFFS will typically be in the NAS space So CFFS is just like two things It's a POSIX file system. So when you use a POSIX file system you use iNodes So when you access the data that is actually stored in the file system you access iNodes The problem of CFFS is that remember we have an object store in the back end So an object store knows about objects So the file system that we want to use expose users iNodes But what we store in rados are iNodes So we're gonna need to have a way so that we can actually do the mapping between the iNodes and the actual objects We have in the CFF cluster But also to store the ACLs and various information When the clients actually access the CFFS file system That's gonna be the job of a particular demon and That particular demon is going to handle all of these metadata to do the mapping and store the ACLs So these particular demon will know will be known as the metadata server and For the data, which is the actual content of the iNodes those will be stored in objects Inside the OSDs in the CFF cluster So to access the CFFS file system, we have one kernel module. So just like we have KRBD We have one kernel module That is available. So remember as tech preview for redacted storage 2.0 On rail, but it's also available on other distros such as Ubuntu for example To just name one. So that's for the coverage of the functional part of CFF and If we need to do a mapping of where will it fit with OpenStack and what it is a good fit with OpenStack That's all in one slide So we have the rados gateway that will be able to be integrated with Keystone so that We use the user definitions that are in Keystone to grant access to a particular data The rados gateway does support Swift and A3 So all of the Swift requests or A3 requests in your OpenStack environment will be handled by the rados gateway and Remember the rados gateway using Librados actually stores and retrieves data out of the CFF cluster we have Now for RBD we do have the ability to integrate sender with the RBD feature So that every single block device you create within sender will be using the RBD feature Inside CFF. We also have the ability to integrate Glance with RBD so that every single glance image will be stored on a CFF RBD And we also have the support of RBD for NOVA ephemeral storage So for NOVA ephemeral storage when you boot your VMs and we just like download the boot image And we just like put that into ephemeral storage. We'll be able to use RBDs in order to do that Now the last part that is completely on the right. That's Manila Remember CFFs is take preview only so the idea is that when CFFs Is going to be tagged as production ready for Red Hat Safe Storage Then it will become a very good solution for Manila so that you can actually configure the Manila shares That you want to assign to your VMs and that can be accessed through your VMs in your OpenStack environment Now the ability to actually use CFF in every single part of your OpenStack, you know deployment Makes it really a very simple solution because he can actually fit all the spaces and all the spots for whatever place you want to use and that's why CFF is a very popular choice in OpenStack deployments. So quick question. Who's using CFF in his deployment? Only? Probably would be more than that Who uses NetApp? Who uses LVM? Three, four, five. Okay Now let's have a look now on how we can do it and just to show you How Easy it is and the few commands that are needed and the few parameters that are needed so that you can actually perform the integration Of the pieces together So remember we said glance images, so I just put in one slide just to show you basically how short it is So just as a reminder CFF and an OpenStack have been just like very close to each other and CFF became very popular in the OpenStack community very early in the cycles The the version the OpenStack version that really brought a lot of stability and ease of deployment When it comes down to the integration between CFF and OpenStack, it's actually Grizzly So Grizzly was the first version where things became really easy and pretty stable That's when it's all started and then it just got refined more and more to gain in maturity and even better deployment So the first thing that we want to do when we want to perform an integration so that Glance uses CFFRBDs For storing the Glance images that we're going to create a separate partition inside a self cluster So that in this particular partition remember we call the partition pools. We only have Glance boot images so that Partition can be created by one and only one command in the self cluster is the self command OSD pool create You specify the name of the pool the name of the logical partition you want to create And you specify the number of placement groups you want in that particular partition Now you see that I use two power of X because it's a best practice in CFF so that we get a better Distribution of the placement groups amongst the OSDs Is to use a power of two for the number of placement groups that you want in each logical partition Then you will use the self all get all create command so that you create one user So that particular user can actually access this pool so that we can upload the Glance images into the pool But we can also retrieve the Glance images from that particular pool So we specify at the end the dash or option so that The path to a file that we call in self environment the key ring file So in a self cluster to handle the permissions we create users and each user is assigned a secret key And the key ring file will be a special file that contain the name of the user and the actual secret key for that user So if you want to be able to access the self cluster from the Glance node You will be a you will have to Make a copy of that key ring file and copy it onto your glance node so that your glance node can actually access the self cluster So we will copy the key ring file with an SCP operation onto the glance node And we will also make a copy of the set configuration file that is by default in slash at C slash chef directory on any of the set nodes and In that particular copy that we will push over to the Glance node. We'll just add two lines square bracket the name of the user that you created for Glance and Keering equal the path to the location where you actually copied the key ring file Now just make sure that these key ring file that you copied over to the Glance node has permission so that the glance demons can Actually access the file If you cannot basically the connection to the self cluster will fail because without being able to access the key ring file The client cannot connect to the self cluster Now on the Glance side, so this takes care of the set side on the Glance side We have the Glance stores section So we will list in the stores available RBD as one of the stores available Default store if we always want to use RBD as the default store we can change the default store to RBD Show image direct URL equal true and this is for a particular reason is that in order to Make good use of self we use when we boot the VMs the The clone feature that is available on the RBDs and for the clone feature It can be to be able to be used on the Nova compute nodes We need this particular parameter to be set so that we can create a clone of the boot image of the Glance image Directly from the Nova nodes so that we can actually boot quicker the actual VM RBD store user will be the username that we have created up here RBD store pool the name of the pool that we have created up here RBD store safe comp the path to the safe Confile so by default slash sc slash f slash f dot com but you can put it anywhere RBD store chunk size, which is the size of its object for every single block device We want to create in the rados object store By default if you do not specify this value the value will be eight So every single object that we create for every single block device that will contain Against image will be eight megabytes if you want to reuse smaller objects the values can only be power of twos so four eight 16 and the maximum value is 32 we cannot use objects bigger that bigger than 32 megabytes for an RBD device and Flavor equal keystone because by default the flavor parameter contains When you deploy open stack out of the box if you don't say anything Glance will be configured for LVM and These parameters will be set to keystone plus cache management So that we can speed up the access to the Glance images, but because we use RBDs We do not use these cache management So you have to set the flavor parameter to keystone only And once you're done with this the only thing you have to do is to restart your Glance services And it will just work so then you can create your Glance images and upload whatever you want So one note here when it comes down to the Glance images by default On the Nova compute node most of the people do store the Glance images using QCAL as a format QCAL 2 But for RBD to be you know the most efficient We recommend that you store those boost images as row format Why because this way you have a completely expanded boot image and remember we make a clone So by creating those images in a row format You actually speed up the boot process for your VMs if you do not choose row as a format We'll have to just like we what we do for any other type of store We'll have to download the boot image convert it to row so that we can actually use it So one of the pre-rex is to upload row images into Glance now for sender So very similar. So we have to create a pool Now, why do we create a separate pool? Because typically the load that we have on Glance images is completely different from the load We have on sender volumes So we always have a higher load and a high users a higher space Used for sender than what we have for Glance So create a separate pool and create a separate user for sender now One of the advantages with sender with the later versions is that we can for a long time now Sender does support multi storage back ends. So you can actually create full sender multiple pools in your self cluster One pool that uses set of drives one pool that uses SSDs and you will in sender create two back ends One for set of drives one for the pool that uses set of drive one for the pool that uses SSD drives And this way you can actually depending on the type of VM You can select on what type of storage you want to create the volume for that particular VM So you will copy the Keering file just like what we did for Glance and you will copy the set of com far just like what we did for Glance In the configuration file on the sender node Just the same two lines to add square bracket the name of the user square bracket Followed by Keering equal and the path to where you copy the actual Keering file So that the sender demons can actually access the self cluster So when you will be creating a volume basically sender will go directly into the self cluster and create an RBD device Directly in your self cluster, and that will be the RBD that will be attached to the VMs Now in sender.conf we're gonna have to create one or more Cinder backends, so square bracket the name of the back end that you want to create in sender The driver name will be sendered of volume the drivers the RBD dot RBD driver the capital Does the uppercase does matter? RBD.sefcon will be the path to the sef.conf exactly what we did here RBD pool the name of the pool that we created and we're gonna have one special stuff here Which is the RBD secret EUID so these RBD secret EUID will be used so that when Libvert is trying to access the sef cluster We need to find a way so that we can make the relation with the credentials that must be used by Libvert To actually access the sef cluster so these EUID here generated with a EUID gen command Will actually be used by Libvert and we're gonna see the instructions after on how to create a secret within Libvert So that Libvert can actually access the sef cluster when the VMs running and RBD user will be the username So this is basically the integration between Cinder and Libvert where all of the other parameters are actually Used directly by Cinder to actually create the block device in the sef cluster How do we create the secret for Libvert? We make one file one temp file. You can call it the name you want secret ephemeral EUID that will be the famous EUID that we reference right here. Okay, and Two options here. It really depends on how the how the people handle their configuration You can either use a single username and a single EUID to access your sef cluster Even if you have multiple Cinder backends or some people because they want to be very strict They actually create separate users in the sef cluster and separate pools And they assign a particular user and a particular EUID to every single storage back end So it's up to you to do whatever you want. You can do both Now on the Libvert nodes, that's where you will create this actual file right so usage type will be sef and you will specify the username followed by secret Now you will do a secret define using this particular file So these creates the secret the secret contains the name of the user and the idea of the secret is the EUID That we have in the file. What is missing? Remember we said that for sef to be able to access the sef cluster. You need to pass a username We already have it here. What is missing the famous secret key that we were discussing earlier? How do we assign the secret key? You will get the key from that particular user that you created for sender your sender back end And you will do a secret set value dash dash secret the idea of the secret that you have created and Dash dash base 64 the secret has to be encoded in base 64 And that will be the key the secret key for that particular user So at the bottom of the slide if you got the proper versions Did they print something or did you get access to a PDF? Did they send you a PDF when you registered for the session? Okay, so I'll make sure that you all get the PDF just in case so there is one command That is actually very simple. Remember we have a sef Auth get or create command followed by the username You have one command sef auth get dash key Followed by the username and these only extract the secret key and that's the secret key that you want So either you store in a file and you do a cat to do it or if you have the proper permissions You can actually directly run the sef auth get key command inside the dollar Parasites so that you can just like run that in a single command Now if you have multiple Libert nodes where you should have most of you I just remember that you need to synchronize the Libert secrets directory So that this secret that you define on one Libert node is actually available on all your Libert nodes Now remember we said that Nova can also use sef RBDs for fmr storage So if you want to also use that then in Nova.conf in the Libert section You will set Libert images type equal RBD Images RBD pool will be the name of the pool Where you want to actually use and create your RBDs for fmr storage The RBD sefconf which is the path to the safe configuration file Lib disk cache modes network equal write back Remember this is a fmr storage So the idea is that we want to try to speed up the level of performance that we're going to be able to do on the fmr storage So RBD does support different caching mechanism and one of them is known as the write back caching mode So that we actually use the RAM directly on the client so that we can perform the IO requests Directly in cache. So we maintain a cache on every single Nova compute node basically on every sef client node, but in the case of Nova that will be on the Nova compute node RBD secret EU ID remember the EU ID so that we can access the the sef cluster So just like what we said for the multi Cinder storage back in some people reuse the same user for a fmr storage for Nova Other people just like do we do create a separate user for now? Nova fmr storage. So It's up to you guys and The user ID that will be contained in the secret that you will create for liver and You just need to restart the Nova services and every time you will just like Buddha VM Now the fmr storage will be directly on RBDs So Nova that will create directly the RBDs and does whatever it wants On those RBDs that will be stored in fmr storage Now one thing that is going to be very important is that when you Have your environment a lot of people Most of the time when you know people call because they have a problem They open a case or when we run the trainings a lot of people get you know Scared we're saying okay. We have our Nova compute node and we're gonna have all of our VMs running on all Nova nodes How can we actually find out when we have a problem what to do and there's a Feature that is available from safe and these features known as the admin socket So the admin socket is the ability you have to talk directly to a particular connection So by default when you are a client You do not create an admin socket only the demons the OSD is the monitors the MDS is Do support do enable these features by default? But these feature can actually be enabled on the Nova compute node so that every single time You instantiate a new VM on the Nova compute node These connections for that particular VM will have this admin socket so that you can actually troubleshoot the connection When the VM, you know the guy that just like he's using the VM is telling you I don't understand the VM today is very slow So I mean socket equal so square bracket client up the username that will be The username that you have configured for Cinder and if you want to troubleshoot the connection for the ephemeral storage That will be the name for the user for Nova So you if you use the same user you only specify one section if you use multiple users You will have to create multiple sections I Mean socket equal var run safe. So you create a sub directory guest, okay? And in there you will do dollar cluster That's the name of the cluster and by default the name of the cluster when you deploy is safe, so it's If you did a if you do a default deployment, it'll always be safe dash dollar type that will be client Dot dollar ID that will be the user ID So basically the username that you have here that will be inserted here dot dollar PID the process ID for your VM Dot dollar CCTV ID. That will be the safe Cluster context ID remember we said that when you open a connection to the self cluster You have to specify to with what pool you want to work So if you actually have for example on the VM Two volumes that you assign to the VM One that is on SATA drive in on a pool that uses SATA drives and another Device that is on a pool that uses SSD drives. This will be two different contexts One to the SATA pool one to the SSD pool. So this way you will be able to troubleshoot Either maybe the connection to the SATA pool or maybe the connection to the SSD pool So to be able to talk to that connection Seth dash dash and mean dash demon the path to the file Followed by the command that you want to issue against the connection that you want to troubleshoot If you don't know about the commands to run Help and help will print you on the screen all of the commands available supported by this particular connection One of the commands that is mostly used most of the time is perf dump so that you will be able to dump The performance counters for these particular connections to the self cluster What is cool, too, is that you can also do a config show so that you can actually dump all of the parameters used when connecting to the self cluster config set and Config get so that you can actually inspect only one parameter But most important so that you can actually change the value of one parameter Why is it important? It's important because of this so remember in the session. We said we enable the admin socket But we are also going to redirect and assign a particular lock file to the particular VM and Using the config set command when you are troubleshooting You can actually modify dynamically the debug level of the library so that you can have extra traces and The traces will be stored in that particular lock file so that you can either if you're Familiar enough with the type of information Logged you can do the troubleshooting yourself or if it's more serious when you call support You can actually send them the file with the traces that you got directly from the production environment So that you do not have to listen to the famous sentence. Can you please reproduce the problem? Of course, I cannot because it happened just like in the middle of the blue So that's better to be able to do it live as it is happening Now how do you do that? The parameters that are responsible for logging Always start with debug underscore Okay, so you have debug underscore client debug underscore MS debug underscore OSD Debug underscore MDS So all of these parameters are accessible directly with this command The highest level of debug you can set for a debug parameter to activate the tracing in the lock file is 20 So my recommendation if you're having a problem you see it happening I just like do a config set the name of the parameter most of the time the first thing you will troubleshoot Will be the communication library what we call in the 7 environment the Lib messenger library That's the the library that is used by everything in Seth or is these monitors and the SS clients to actually Encode the TCP packets that they that we exchange So one of the first thing that you should do when you're troubleshooting is do a config set space Debug underscore MS space 20 and this will set the maximum level of debug For the messenger library so that you see all of the messengers Exchange from the Nova compute node where your VM is running and for that VM only With the monitors with the OSDs and everything you're talking in the staff cluster And at least with that if you open a case or you're familiar it will you'll be able to get the proper information So we are running slow Sorry Okay so you have the extra slides for the rados gateway integration and You have at the end of the slides So we're not able to cover them everything when it comes from related to Redact subscriptions redact insulting redact storage training if you're looking for training on Seth And very important you have access to a redact storage test drive So that if you want to play with Seth for those of you who have never played you can actually create your own little Seth cluster directly using what we call a test drive in AWS Thank you so much. Hope you've enjoyed it and that you learn stuff