 Hello everyone. Today I am going to talk about reworking observability in CEPH. It is basically use case of distributed tracing. So let's get started. A little bit about me. I was an Outreach intern for summer 2019 and I worked on adding distributed tracing library Yeager to CEPH. Currently I am taking on the same project and working more on it as an associate software engineer in Red Hat. I like exploring cultures, tech science, nerding on books and discussions. So let's get started. So what all things we will cover? Basically I will provide a basic overview of CEPH architecture that will bring us on same page that will bring us on same page that we understand the terminology associated with the project. Then I will describe the problem that arises in distributed systems such as CEPH that is the problem of context propagation. Then after describing a bit about the problem then we will see the solution that approach for the solution for solving that problem which is distributed tracing. Then I will describe a bit about Yeager, the application that we picked for applying this approach. Then I will present a few slides talking about how it is looking or how we are seeing the library perform for distributed tracing for CEPH. Then I will talk a bit about the key challenges that I identified working on Yeager for CEPH. People can use that for adding distributed tracing library to any kind of microservices they are working with. Then we can have a short Q&A session. So let's get started. So brief introduction of CEPH architecture. So CEPH consists of client libraries such as LibRedos, LibRBD, RGW. So these libraries used to communicate with the clients and once we have the object, they take input from the clients as objects and then they communicate with LibRedos which will transform our object into the form that is understandable by the internal parts of CEPH. So basically it converts an object into the format which can be stored into the backend of CEPH. So the backend of CEPH consists of monitors, then there are object storage demons, then there's metadata server, there is Redos for communication, for intelligent handling of data. So when we perform a read or write operation, the client actually, what happens is client has a cluster map which it takes from the monitor which consists of whole overview of our cluster. Once the client has the cluster map, it knows from where to communicate to get or retrieve our object from. So when we perform a read operation, the client calculates where the object is stored and from there the object is retrieved. The similar way we can do for write operations. Now about, so Redos is the intelligence unit that will identify which are the optimal places where the object should be stored. The OSDs or the object storage demons are the intelligence units that are lying just above our physical devices that are communicating in between themselves to the monitors and they are the ones that are handling the background, the actions, the write, read, backfill recovery operations for healthy communication between our whole cluster. So now what problem arises in these distributed system is since these processes are so distributed, there is async operations happening, there is a lot of context propagation happening at one instance and these contexts are even not in monolithic nature. That means they are not threaded in one as a one operation. So when we want to debug our process, what happens is we are not able to identify the origin of our failure. So in that case, what we generally do in distributed systems is we go for seeing the logs but since there are so many intelligence devices participating in distributed system that it becomes difficult for us to identify the system, where to look and then see the logs there and identify the cause of failure. So as I talked about that we can't attach debugger to four different processes and try to step through the debug request in that environment and we can't have logging libraries for sampling logging excessively. Discrete and distributed request and events often fail because of multiple triggers across different distributed components. So as I talked about that when we are using distributed system, we as a developer would be facilitated if we had some tool to identify the abnormalities in the system. If we can see that which process is taking a longer time or which if we can see the trees of how our context is flowing that would ease us in identifying which is the origin of our failure. So for that an approach that we can use is distributed tracing. So what happens in distributed tracing is we when our service starts it assigns a unique identifier to each request at each request point when it is propagating. So suppose our process A is our starting point of our service, we attach a unique identifier to that process and then that unique identifier will propagate to subsequent process. So with these attached identifier we also propagate the timing, the events and if we want to attach some metadata associated with our function. Suppose we want to see what is the stored flag options that the right or some indicator for whether an operation has successfully executed or not. We can store them as tags and what Yeager would do is at the end it will stitch all these spans, these individual units that were propagated for each process and form us a trace. So a trace is actually demonstrating, so our process started from point A, it went to point B, it went to point C. So at D it is not showing us the metric that was desired at that point or at D it is taking more than usually what it should have taken. So by seeing this visual interpretation of how our process is travelling we can identify our performance issues in system, where our storage, where our processes failing or our OSDs are getting down. So talking a bit about how Yeager performs Yeager's architecture, Yeager consists of, so we added instrumentation to our application. So once the instrumentation has been done, we are going to use a vendor neutral API, which is open tracing. Prior to that I will describe why we chose open tracing and Yeager. So open tracing is a vendor neutral library. By this I mean that we can attach different backend support as a tracer. So what a tracer does is it collects all the spans and then stitches it together to form a trace. So when we add instrumentation using the open tracing library, we can have different backend support other than Yeager as well. So taking that in mind, for future use case, we can have maybe different tracer attached to the same instrumentation that we had done. Secondly, why we are using Yeager, I will highlight that a bit later. But first, so we have to just start a tracer. Once the tracer is started, the open tracing API, wherever we have instrumented our spans, at those points the spans would be created and once we open the open tracing, sorry, Yeager UI, we can see those spans, we can see those spans converted into trace. So what happens is when these spans are collected by Yeager client and this Yeager client will then send the spans to Yeager agent, which will then communicate those spans in UDP thrift protocol and send it to a Yeager collector. Now Yeager collector is the one that will stitch all the spans together by looking into the unique identifier stitched at each process. Now the support, what advantage we have with Yeager is it provides an always-on tracing support. So basically in past, CEP was using block-in in which we had to, we can only see the traces by switching on block-in and then we have to switch block-in off. But with Yeager, we can use Yeager in production as well. It can be an always-on tracing library. And if we don't want to have that overhead of tracing library, we can have the sampling rate reduced to zero that can curb out any additional performance overhead that a tracing library can produce. Even Yeager is part about sampling, so if there are 10,000 requests that are being produced. So it will identify which spans to take and then which traces to take and then present it over the UI. You can even configure if you want to have adaptive sampling, probabilistic sampling, which you can see more on Yeager documentation. Then, so for starting up a trace, you just need to pass a configuration file as config YAML where you can specify the type of sampling you want if you want to have a database attached and if you want to enable logging. Once the tracer is started, these are some instances of how a span can be added. The interesting thing is, this is the OSD part, input-output part that I worked on tracing. You can see that over the horizontal bar, it is showing the time duration of when the spans are recorded and the duration of the spans. Then, this is a trace consisting of the OSD input-output path, which is specifying that the request was started, it was enqueued, and so on and so forth. We can see how much time it is taking for each operation. Once you want to have more details, you can see there are metrics such as tags, logs, where you can add any specific parameter that you want to see as a visualization. This is the visualization that the Yeager community is currently developing. You can see there is an experimental feature shown in the visualization. Then, I talked about it. It is a work in progress. Basically, there is a problem. We are working on making it for long-term support shippable Yeager along with open tracing. Once that is in place, you can see there is a trace point I have. You can see this trace. Once support is added for making Yeager and open tracing shippable with Ceph, we can easily instrument Ceph as per our requirement as well, and some of the important points that we generally face issues with. We will have instrumentation for that in place. You do not have to do that. That all is taken care by Yeager. You just need to add instrumentation code such as I did here, adding a span. I am starting a span and then passing a child of function. It is creating a child span for OSD parent span. That OSD parent span I have created at the common Ceph part. That is propagating to whole process. In that manner, you can work from that. It also has backward compatibility with Cephkin. I have not tested this yet, but Yeager also provides backward compatibility with Cephkin. That means all the trace points that we currently have in Ceph, we could have that rendered over Yeager UI. There is a specific port that they have dedicated for that. Some of the struggles that I identified while working on this project were that you have to handle spans with caution. If there can only be one parent span in the whole context, you can have multiple child span follows from span, but if you will have open span, there is a probability that it might not be rendered or it might not create a complete trace. The closing of spans are important. There is some yet to develop features that Yeager has. Again, it does not provide a cyclic graph-based model. Parent span is only there, so maybe they will provide that support in future.