 Okay, let's welcome Abishek So hello one. My name is Dennis Contrutenko. I'm working in SUSE For storage product that is actually based on SAF and today we will talk about the SAF and Yelk We'll talk about it from the different perspective not like we are going to run Yelk on SAF but how Yelk could help us with SAF a little bit and Probably you everyone know what is the elastic search is what is lockstash kibana that is a stack of different products from the elastic Elastic search is basically the database lockstash is Several pipeline that sexually allows you to get data transform it somehow and to put it Probably in elastic search. That's what they claim they'll and also kibana is the GUI for all this So you can query from it to elastic search monitor or sound sync and etc expect. It's actually another Product that provides you machine learning and alerting from same elastic search database What I am be talking about is logging logging. So Actually, everyone knows that's a surface cluster solution and we have a lot of logs that we need to gather somehow put it somehow in database and alert the people That's there are some problems. So just analyze it. So how you can collect it. It's actually well known old Technology is our c-slot c-slot and g you can just gather it on the node and Push it to the elastic search as well as they have also file bit It's special applications that runs on the node You can run it that will not just to gather the logs, but also preprocess them as well as Greylock self actually has the build-in greylock support So self itself could forward the locks where you want for that You just need to configure it somehow why we need the locks we need lots to store it for later use for analyze like analyzing it to understand if there are some problems happens crash happens you just could alert immediately someone and You also can analyze those logs with machine learning and that's built in in elastics X pack and as well as elastic search has a good client for are so that's That's how you can explore those locks and yeah for sure and the cluster is dead or some Notice that then you get as the developer Sleep of their logs and you need to analyze them So you can forward those locks as I said with RC's lock RC's lock has a building functionality. You can just configure it. So it will run those lock in greylock to the server and The lockstache Excel, it's why actually I'm talking about elastic search in lockstache because it's really simple framework With a lot of build functionality So it's already could parse those locks like C's locks like grey locks It's already built in so you do not need to to do something for that as well as they have file bit that could collect information for you and push it to some server Ceph greylock support so it's built in it's not really documented But you can find these parameters in the Ceph itself and it's works at least for me it was working fine But so you just need to configure those parameters Restart the Ceph service one by one for sure you want that and For example, we are using the deep-sea that's sold based configuration management So you can run one coland and that's everything you need to do and then Ceph will forward those locks to greylock server and In my case this lockstache for example and lockstache could collect this lock and push it forward to the elastic search So as I said the lockstache already provides some primitives for that But you could not just parse not not just push it to lockstache But also preprocesses for example you can find some fields and that's easy to do this grog patterns It's just the regular expressions that you can use that will preprocess your locks and I don't know Just push a little bit more information into the database or just used for alerting There are a couple of examples these slides are available in the link under our talks So you can go ahead and look on examples how this this is lockstache pipeline It's like filter part of it that actually parsed Ceph locks. It's like two Two lines of the regular expressions that sexually parsed all the Ceph locks So we have two different formats in Ceph somehow And these two lines as example how to parse it. So it's just some time stamps some Fields from their locks like thread and etc. And such So to actually as a developer you probably need to have some development elastic search cluster It's really easy to deploy in the docker and I used to docker compose. It's just like couple of young It's one YAML file where you define your cluster you push docker compose up and it's brings up like two three notes cluster with elastic search Lockstache kebana. So this is example. It's really pretty straightforward and easy to use for development So I can advise you to use docker compose for that for development purpose and why we want to do that it's actually gather it together in some Interface and actually expose this interface to the user So they can find out something myself. I see the use of this tool For sure for the some troubleshooting like you have the locks You are level two level three engineer maybe developer and you have dashboard that actually queries elastic search for some known problems for example or some hints that you can see the picture of it Just on the screen. That's just we'll speed up your development or your troubleshoot analyzing or Pateroids, maybe we'll say you in future. What kind of problem you have? So the kebana has build-in functionality for dashboards for querying the elastic search database so you can actually easy to do that So there is one example. We have support config That's special tool that we that our level to level three engineers use Together the locks from the cluster like it's they run it No, but not and then send it to us as developers saying like here is some problem and actually as a Database already prepares you can build like cluster picture here in the example We have like how many notes we see in the logs What is the kernel version? What is the surf version and what kind of problem here? Here is like just a simple query for health not okay like health swarn or health sorrow so that's actually give you some idea of some idea of the Current issues on the cluster as well as you can use simple searches then for for the database to see Actually What happens in the cluster so my idea may be that? We can have some patterns and defined some searches and queries defined that could say you as a developer or Support engineer what happens? What's wrong? What maybe the basic issues like here in this example? It's like rather this gateway cannot find the key. I believe Yeah, that that's like that's fairly to start there are those gateways service and if we will have such List of patterns we can parse we actually can Quickly and more More quickly actually find out the issues or the state of the cluster for sure you can use their elastic search Real time and discover some of the problems as well So that's was my part about the elastic search Parsing locks touch and of parsing the locks and right now I Give the microphone to Abhishek. He will say a little bit about different approach What how we can use elastic search more? Can you hear me at the back? Okay, can you hear me at the back? Okay, so I'm here to talk about the Radows gateway the objects to reach component of safe and how it actually ties into elastic search So I'm a bit about rados gateway. So it this is the objects to reach client to the safe cluster By client. I mean it's a lib rados client to the safe cluster. So it basically translates your HTTP request to lib rados requests and Lib rados does all the intelligent placing of data and Be based on your storage policies and everything like that so Yeah, it's basically a restful API access. I provides you a restful API access to the safe cluster And I already explained about safe cluster in this open attic So I'm not covering more about what is the general architecture of safe or anything and We provide basically both swift and S3 API access to the safe cluster. This is primarily because like most of the Client tooling for object storages already built around these two APIs, which are like very well known So it makes sense that we actually expose these APIs rather than some of our own APIs to access object storage So since there's already a heavy ecosystem of these client tooling we just reuse the S3 and the swift API and most of the clients are happy with that and We have these concepts of user accounts buckets Accels and everything which is similar to the swift and the S3 concepts of these We support a lot of S3 like features and we have cross access with S3 and swift But the thing is like since both the semantics of both the APIs are completely different with some of these Protocols, it's like you cannot actually upload an object with multipart in swift and not access it in S3 So you have a lot of history like features like multipart uploads object versioning You can download the object as a torrent You have some life cycle policies which allow you to delete or expire objects then You have support for encryption There is some compression support from luminous release of Cep You can actually have host a website with static websites and now we actually support metadata search with elastic search From the dual release of Cep we have this concept of multi site, which is basically geographically Redentance F cluster. So you can actually replicate your Cep cluster geographically to another Remote location and you know basically transfer your S3 data. So that is the concept which this is built on so elastic search as Dennis already explained it's just a Distributed horizontally scalable Document search engine built on Apache Lucene. So it basically speaks Arrestable API and literally every configuration you do an elastic search is done. We are it's so it's pretty easy configuration and the Motivation behind why we need to build something like metadata search of Reduce gateway objects with elastic searches basically that you already have a lot of metadata associated with your objects So for example, if you're doing video analysis Then you may actually have a tag saying like this video is uploaded by this author or this user or something like that and You might want to actually query and find out, you know how many videos are uploaded by for example dreamscape with what average sizes or something like that and Since it's an object story, you do not have any traditional file system or analysis tools at your disposal We have some support for this in terms of rados gateway admin API So you can actually get specific metadata when you query for it But the problem is since it's very specific like if you ask For this specific bucket or this specific object. I need metadata and we'll give it to you but that doesn't Lend itself very well for analysis And you actually have no notifications when you have like new objects or new buckets or new accounts being created so you have to constantly pull and write a very large-scale system to actually, you know analyze this So this is where you know elastic search already comes in because it has already built-in primitives for you know slicing and analyzing data And permissions for users to access an admin API is also tricky with the existing API because it's only means for administrator and It gives you full access to all the metadata So as a storage administrator, you want to actually analyze, you know What are my top 10 consumers of the object storage or what are my really hot buckets? So how is my object storage being used on Friday evenings? For example, and all this is something that's very trivial to be done with elastic search So the design it's actually built on top of the multi-site architecture. So in Multi-site architecture what we actually implemented is that you have two safe clusters and Two radios get to be safe clusters and you are basically translating the object Uploaded here on to another remote site. So you have already semantics for asynchronously basically copying data from this cluster to the other one so since that is already there and Radios get we is the consumer on the remote side We actually thought that we can actually leverage this to Consume it not only by a by a radios gateway, but for a third-party plug-in which can actually forward this meta data and data to a external tier so What we have already built in is like Elastic search for from Kraken release of safe, which is basically that we have the metadata already Notifications from the remote side. So you just forward it to the elastic search and the same concept can be actually used to build a backup solutions on top of Multi-site so you can actually have a plug-in to sync objects from your safe cluster on to Amazon Glacier for example or Amazon S3 and You can even for example write a custom, you know Sync plug-in that would actually sync your object data to a tape Because it might have a custom, you know semantic to actually, you know write an object to a tape and You know you can build something like Amazon Glacier because now this is actually possible because you actually know that you know There are notifications when objects are uploaded and you know how to pull the object from a remote side So this is The concept on which you know the elastic search plug-in was actually built So essentially you already have the metadata from the remote site and you just forward it to the elastic search instance and What you have is a remote rados gateway that is just purely a forwarder proxy of sorts that actually pulls data from the Original safe cluster and just forwards it to the elastic search There are some problems with this you do not have a off-the-shelf authentication module that can work with elastic search and rados gateway users and You really do not want to expose your elastic search endpoint to the public because you know if you actually have Information about your object metadata then that will tell you almost everything about your story So you don't want to expose this in public So what we built from luminous release of surface like for normal users rjw itself can actually authenticate the end user and You know actually for an end user who's a user of object storage He can also use the power of elastic search to analyze his own user account and object metadata So in the object metadata, we already have an attribute for the owner of the object and that's how we actually Authenticate and make sure that the user only sees his own data And we also support for custom metadata fields being indexed in terms of elastic search fields So you can actually not just do the basic text attribute, but you can do dates and other kind of attributes So this is like a very trivial diagram of this architecture So you have a primary rados gateway and the primary safe cluster. That's your normal regular storage cluster and then you actually configure a remote zone and Rgw that only reads metadata from this primary cluster This can be in the same class of cluster or a different safe cluster We I'll recommend a different safe cluster if you want to be you know completely redundant and not depend on your primary data store and then You just forward it to the I mean this rados give you basically forwards the metadata to the elastic search cluster And this is actually Not just applicable for a single Primary safe cluster you can actually have a ring of you know three or four safe clusters like Amazon US east and US west and the Europe central region and forward all these metadata to a single elastic search cluster This is the like the example JSON of the metadata. We currently forward to like elastic search, so you basically have the name of the bucket and the name of the object and your object versioning and the owner attribute You also have some basically the metadata which tells you about the size of the object and the time it was uploaded and Even if you actually I mean from last release of safety also have support for object tagging So you can actually attribute custom metadata per object So you can say like this key and this value are associated with this object, which is very useful when you want to you know Slice your data based on a key for example. You need all the videos by this author or all the books written by an author or something like this And since it's elastic search you can aggregate queries pretty easily You can just give an average of the total objects uploaded or you can even do it on a date basis and something like this And the elastic search will respond you telling this is the total average you had total 22 objects and the total size was like 177 bytes And you can actually have queries on specific metadata content Which can actually help in you know finding out data from specific users or something like that Then this is the future work in place So we right now have support from elastic search up to six for others gateway So the future work is to support elastic search six Then custom metadata feels for object tagging which we do not have support for right now Then the next work is for the plug-in to actually analyze common system falls the locks dash plug-in and Integration on to safe dashboard and analysis with you know for example machine learning which elastic search provides or something like that So the support config is a report for what Dennis demonstrated with the docker and everything And these are the official safe reports and the higher ceilings and everything like that And questions how many minutes do we have? five minutes Yes So the question was do we have any data on very large set clusters and elastic search Answer is not really we do not have any customer data right now on Elastic search, but the answer is like we actually do not I mean we are not dependent on This primary sets a cluster at all for the metadata. This is completely, you know Asynchronously mirroring data, so it's not I mean it's not real time that your metadata is actually there in elastic search They might be a delay, but it's not affecting your primary cluster at all in terms of your IOP or the metadata path or any sort of path Any other questions? All right. Thank you