 Good afternoon, my name is Mesh, and I work as a principal consultant at Fort Worths. And my colleague, Mushtaq, is a Scala geek. So last several months, we have been working on this very interesting project called 30-meter telescope. And we are building core infrastructure services and common frameworks for the telescope software. And one of the first things we build over these months is a service discovery mechanism. And we are going to share some of our learnings of how we could use CRDTs to effectively build a service discovery mechanism. Quickly about thought worlds, I think, because a lot many people are, we are thought workers, we can quickly skip through this slide. But we are around 4,500 people across 15 countries and 42 offices. We are from thought worlds, Pune. We are doing this project from Pune office. So a quick word on 30-meter telescope. This is going to be one of the largest optical telescopes when it's operational. So it's going to be operational in 2027. As of now, that's the plan. And when I say largest, what it means is the size of the primary mirror of the telescope is 30 meters in diameter. The current largest that's operational is 10 meters of the mirror. So building the telescope of this size, having this mirror of this size and operating it is a huge technological challenge. A lot of hardware that is getting built, even that is getting built for the first time, known as built, for example, mirror of this size. So this mirror is going to be made up of 500 mirror segments. It's not going to be single mirror. And you have actuators and controls for controlling each of that mirror segment. And a lot of that is experimenting. People who are building it are doing it for the first time. There are five countries who are involved in building this. So it's a consortium of five countries, US, Canada, China, Japan and India. And software development is primarily happening in India. So let's look at the logical architecture for which we build this service discovery mechanism. So as you see in this building, the telescope is going to be a huge structure. So it's going to be a four or five-story building like this one with aperture on the top, which will have that mirror for the telescope. And obviously to operate a telescope this large, there will be hundreds of hardware controls that you need to have. So if I just talk of logical architecture in terms of layers, at the bottom most layer, you have the hardware controls with embedded software on them and each having its own proprietary protocol to control that hardware. To talk to this hardware, there is something called as hardware control demon layer, which is similar to device drivers. So each hardware control demon will be built specifically for a particular hardware. And on this side it will talk the proprietary protocol for that hardware. And it will expose programmatic interfaces for upper layers to talk and control this hardware. Now about this, this is more like aggregating layer which is called as assembly layer. To give you an example of this, telescope has a subsystem called as laser subsystem which is used for forming an artificial star in the sky or calibrating the telescope. And that subsystem has six laser devices which need to be controlled. Now in terms of the hardware and its CDs, you will have six devices to be controlled at this layer. And you will have six hardware control demon instances running for controlling each of that. Now to control this subsystem as a whole, this laser control subsystem, you will have a laser control assembly to which the upper layer, upper operational software will talk to control this laser. About this, we have something called as a sequencer layer. And sequencer is to control multiple subsystems. So when you schedule a telescope observation, you need to operate on a telescope as a whole. So you need to move a telescope, calibrate, then take pictures. And all of that are steps that you need to program. And that's what happens at a sequencer layer. And then you have application through which you control all of this. So this is how the layers and these are the components. Above this hardware, all the components are built on JVM. So they are Java components. And because the communication between all these components is asynchronous message parsing. So they do maintain state for which is device specific state. And it's not synchronous. So it's asynchronous message parsing. So actors was a very well suited way of building these components. So these components are getting built as ACCA actors because it needs to be on JVM. And one of the key requirements for this was if you need to talk to any one component, you need to talk to other components like ACCA actor. It needs to be discoverable. So the IP address and port, you should not be keeping any configuration. So if a particular actor or device restarts on a different port, it needs to be discoverable at runtime. So there needs to be a zero configuration. And this is a very common requirement if you are familiar with microservices. I mean microservices primarily in the container world, you need to build something like service discovery mechanism. And most container frameworks give you that. So we had to build something very similar but slightly different because we also were using actors and we needed to restore actors. So let's look at some of the common requirements that your service registry, yeah. So we'll cover that. The first half of the talk is giving the context and component. The second half will focus on the CRDT specific solution. So we are just building the context on what the discovery mechanism is and how then the second half we'll see how CRDTs can solve that. So for building service registry and having a service discovery mechanism, you need to fulfill these common requirements. Any service registry or any mechanism needs to have. First is it itself should have minimal configuration. So the registry itself should not have a lot of configuration that you need to have. It should be fault tolerant because your whole system depends on the service registry to be operational. So there shouldn't be any single point of failure. You need to have health checks built in. Because if you are figuring out what other services are available and their endpoints, you also need to know if they are healthy or not. And other important requirement is that of watches. And what it means is when you look for a particular service, if that service is not available, you should be able to register your interest saying when this is available, notify me. Or if there is any change, like a service registers itself, but then it restarts and then registers a different IP address and hope, anyone who is dependent on that service should be notified for that change. It should expose APIs so that services can self-register themselves using that API. And if there are any third-party libraries or tools that you are using, like for example we are using Redis or Kafka, let's say, you should be able to have agents which can register those services as well. And obviously you should support not only actor rates but all kinds of services, so HTTP services or just TCP ones. So let's quickly see, because as I said, this is a common requirement fulfilled by many frameworks in the microservices world. One of the common questions we get is why not to use something like console or zookeeper, which already solved this problem. And we did evaluate, so before moving to CRDT-based solution, we did evaluate all three of these. So let's very quickly go through how service discovery works in zookeeper, HDD and console. So zookeeper, if you know, at the core is 3 to 5 node cluster of servers, which form a cluster using a protocol called as Zab. So that gives you fault tolerance and linearizable guarantees. If anyone goes down, you already have values preserved in others. And then it gives you a file system kind of an API, which the clients can use for various purposes. One of that can be service discovery. So to use it for service discovery, what you can do is, when a service registers itself, you can create a node in zookeeper and you can have a parent node for each service type and for each instance, you can have guy nodes. And then a client can look up zookeeper plan. The dependent service can look up zookeeper plan. Health check is a bit rudimentary in zookeeper. So the node that you create for a service instance is typically ephemeral node. And what it means to be ephemeral node is, till the point that connection from that service is open to zookeeper, that node is present. The time the connection drops, node goes away. And that's kind of an indication that the service is not available. It's very rudimentary support. It does support watches. So all the three frameworks do support watches. But watches are because this is client server. You have a separate set of servers, which you need to talk to. Watches are implemented using long polling. So you need to have a connection open. So there is a watch request that you can make to zookeeper. But you need to always have a connection open. And on that connection, you get notifications. So if, for example, you watch the service one node, anything that happens, new instance getting added or removed, you will get notified. Given you have a connection open for that. The second one is its CD. Very similar to zookeeper. At a core, it has three to five server instances. Forming a cluster. And you do this draft protocol for giving you consistency. Again, it has a GRPC-based client API, which clients can use to register, services can use to register themselves. And before, it's not a file system API, but it's a file system kind of API. It's more like a key value store. And its CD is more popular because Kubernetes internally uses it for exactly this purpose. Service discovery for Docker containers. And what it essentially does is you register using the GRPC API, and it creates a key value pair in its CD. And you can keep timeouts. So you can create a key value store, key and a value with a timeout. And then it's the responsibility of the client to keep it alive. If it goes away, I mean, if it cannot keep it alive, it expires after that one time. And you also have APIs similar to Jupyter for Lookup and Watches. Watches here are slightly more optimal because it's GRPC. It works on HPV2, so you don't need to create multiple connections. On a single connection, you can have multiple watches. The third one is Consul. Now, to start with, it again has at least similar architecture at a core 325 node cluster, which is RAF based. What is interesting in terms of Consul is it also has a concept of agents. And your services deal with agents to register themselves. So either you can use an API to register, or you can have a configuration file and an agent on behalf of the service registers that service. It also has a concept of health check built-in. So you can register health checks, which Consul can use to determine the health of the service. And it has Lookups and Watch interface as well. Watches, again, because this is long pole, because it's a client server, it's based on long pole. Now, interesting part of Consul is the way agents and servers form a cluster. So across servers, the consistency is maintained by RAF. But agents and servers together form a cluster using something called as a Gossip Protocol, which is served. And Gossip Protocol is a very common protocol that is used in clustered applications for maintaining group membership list. So in Consul, you know about all the services, but you also know the state of all the nodes, if the node is alive or not, or which all nodes are part of the cluster. And that information is propagated across the cluster using the Gossip Protocol, which is served. And on the service registration, the service registry is kept on the server. But what is interesting is that, because Consul is used for other purposes than service discovery, like keeping configuration and all, you need to have the central registry. But if you just need to have... Yeah, sure. It is a key value store. I mean, it's a key value store on top of which you build service registry. But the core usage of Consul is not just service discovery. I mean, you can use it for other purposes, like keeping application configuration as well. But if you just need service discovery, keeping information about services, and that needs to be propagated across the cluster, if you have something like Gossip, which Consul already has. Consul, like the node information is propagated using Gossip Protocol. If there is a user API that is exposed and that is available to you at application layer, and your application can form a cluster where you can keep your own information that can be propagated on Gossip, you probably can build service discovery or service registry on top of Gossip. And that data structure that you need for doing these kind of things is CRDD, essentially. Consul, HashiCorp has a library called as MemberList, which just gives you this surf implementation. And it doesn't have exact CRDD support, but it has a broadcast support. So you can keep application-specific data which can be propagated across all the nodes. And because we were using Akka, Akka has a concept of a cluster in build. It's a very good support for clustering. And on top of that cluster, it has CRDD library that is also very well... That's also very good, called distributed data. So we could use distributed data on top of Akka cluster to implement something which is very similar to Consul and which worked very well for us. So Muscat will talk about that specific implementation. Before we go to CRDD, I think in my words, I will just say what is intuition? Intuition is that if whatever mechanism gossip-based cluster of Consul is using to maintain its own member list, available to you as an application developer, then you can keep your own member list, which will be your services. That's the intuition. And then if you just read up, you will know that in any gossip-based cluster, the membership list is kept consistent, eventually consistent, using a data system, which is CRDD. Whether you call it a CRDD or not, does not sign up. That was the case with Akka, but then Akka moved ahead and actually provided the same mechanism that they're using to maintain the cluster list as a library. And we are going to talk about that later. But before we do that, let's revisit CRDD. How many of you heard CRDD in the keynote in November? Okay, so you registered that. It's an interesting RLang community champion, this idea, and it's getting popular in all gossip-based clusters. And we'll see why. But to understand what a CRDD is based on the definition, there are generally a conflict-free replicated data type, which means if your cluster has 10 nodes, the information that you're trying to update and get will be replicated, all of that, across the 10 nodes. That's the meaning of it. The meaning of conflict-free is that there will be a mechanism to come up with the same list across the nodes, which will not involve conflict. And it will not require a coordination. For example, Raft is a coordination-based system. It values consistency. So coordination will give you a consistent set of values. But here you can do it without coordination. And why, how is it possible? Because the operations allowed on these data structures are very, very restricted. And that restriction is defined or enforced by them requiring to have a merge function. I think briefly we talked of merge function. Let's see what it is with an example. Again, to build an intuition set is a, I can say, is a CRDD. And assume that if you have a three-node cluster which allows add operation, add of members, services, whatever you call it. And each such operation is allowed concurrently on different nodes. At some point in time, you will have different data. So in the first node, you will know about AB. And on the last node, you'll know about B and so on. So you are looking at different data. If you query at that point for a set value or check if a value is present, you might get different results or different nodes. But we also know that the reason we chose set as an example is because it has some interesting properties. Like it has a union function. And all of us understand set union intuitively. That set union, if we apply basically n number of times, it gives you the same set. And set union, if you apply in any order to three sets, it will give you the same value. And using that principle, you can be assured that eventually all these nodes are going to see the same set of information. This union function, which is available on a set, is called as a merge function. Why? Because it has certain properties that are essential. And those properties are associative, commutative, and idempotive. I mean, those are the terms from algebra. But just for intuition, it means that if operation is associative and commutative, then order does not matter. Which means if I'm getting in a large cluster, I'm getting basically a union operation coming via two different routes. In one case, I will receive your information before his information and that should be all right. That should give the same result if the sequence was reversed. So it is essential that union function, which is a merge function, is associative and commutative because we don't want to care about the order. We want to be order independent. Same thing for idempotency. So if these operations are performed multiple number of times, then why will that be a case? This is a cluster with not a reliable delivery, which means some messages will be dropped, so there won't be acknowledgment, so they will be tried again, so there will be duplicates. And set union has a property that duplicates does not matter, which is called idempotency. So using these three properties, we can ensure that these sets will eventually converge without coordination will converge. So that's the inclusion of CRD. CRDs have few types, as you heard in the morning. That's not the focus, but I'll just, for the bigger context, I will mention that and then we'll move on to the other example. So the first type of CRDTs are commutative. CMRDTs, they are common. They are operations based, which means that the operations, chain operations, which happen on the daily basis, only the operations are propagated to different operations. They are a little bit less constraining than what we saw, because they do not require the idempotency, which means you can, so they do not require outputters, but on the other hand, they require the reliable delivery, which means that it's essential that you get a message exactly one. But, right, so you can get the message in any order, that is fine, but you need to get exactly one that because you are going to get the message exactly once, you don't have to apply that again and again, which means if your operation is not idempotent, that is fine. But your operation still needs to be commutative because you can get them in any order. So it's slightly less constraining to have a data structure, and this kind of data types have a very important role in building the event-sourced system, where you want a bank account to be available in the network partition across two data centers, but then it should be able to merge. And you still want meaningful operations to have not a very convincing operation, like just an added person. But there are other types of CRDDs. They're called CVRDs or version CRDs. They're state-based. Every time that is updated, the full state of the data is basically transferred. And they're constraining because they require all three properties that we saw earlier, because they do not enforce a reliable delivery. They do not enforce on the protocol a reliable delivery, and that's why they work very well with the gossip protocol. And then we are talking of akka question, which is based on the gossip protocol. So the data types that we are going to talk about are basically CVRDs, which are part of akka library called as distributed data. One small thing, even though you pass the entire state in service registry, your services will be few hundred, so the state is not there. On top of that, there is an optimization called delta CRDDs, which tries to basically pass the minimum information, only the changed information, change state to the other news. And that support is available here in the data. Now this is just a list of CRDs which are available in the library like counters, sets. The first three are the primitives. Set is easy to explain. Let's go to LWW register. LWW last write-up register, which well, there is a decision to be made, which if you are updating the same value, then the decision will be made based on two writes later, and that value will replace that. In case of service discovery, if you want exactly one service to be associated with the name, in some cases, you want multiple instances to be associated with the name, which is a usual case for the doctor. Then you can do the set against the name. But in our case, telescope we want exactly one instance of a hard name to be available, so CRD is the last one. The last one is a composite data type. So CRD is actually composed because of this algebraic property. And what is possible is that you can have a CRDT map where key is a key, a string, or some value, but the value of the map would itself be a CRDT. And if you do that, then you can actually create the whole map and give it the same characteristic like the underlying one. That's a very nice property because of the algebraic property. So we are going to use one of that, which is LWW map. The values are registered. Now the question is, why are we using two and not just one CRDT? We'll come to that. So we have two very distinct requirements. One requirement is that I should be able to watch a particular service against the name and get all the event, not only when it comes up, but when it changes the code and stuff like that. And that is supported currently in Akka only at a CRDT level. So I cannot say that of this whole map, I'm interested only in one key that is not supported. So I have to keep a register separate as a top-level entity. But then there is another requirement that at any given point in time, I should be able to query and find out and list all the services in my system or maybe filter them. Well, if I'm just holding all the top-level values, I have no way to do that. So I need to put them inside another data structure, so that is the constraint that we are working with right now. So we have to leave it to CRD. Obviously, each add-and-remove operation of a service will involve changes to both the CRD. So before we go to the implementation, let's see how it looks in the bigger picture. So these are the three nodes that we are talking about, which are forming different machines. Each one has a JVM inside. JVM inside of a telescope-related software. It's not required that each machine has a single JVM. They can have one. So they are going to have the real deformation. But then each JVM will participate in the implementation. It's unlike consumer. On top of that, we have this location service as our library. We are doing this library. On top of Aka Cluster, it forms a ring basically, there is a way to discover this cluster. We will not go there, but we will talk in the last slide. And then on top of it, you have a CRDT copy of your location directory. So basically, registry is replicated on all this. But interesting thing is that all of this is within the same JVM problem. If you take this simple picture, this is one JVM problem within which the Aka Cluster and also a registry, which is a in-memory copy of the registry that is available to you. And that has very interesting. So now let's see how this registry is accessed. Well, we said that we have to support ActiveRef, PCPN, HDD. So let's talk on ActiveRef first. Assume that the first machine is going to have three different Actors systems having N number of Actors. Each one is the name. They can directly call the CRDT with API that you have provided and give you the demo and do the in-process call. So it's like an application. I'm building an application. I get a handle to the library like a Java API. I make a call to register myself or discover something. So that's important to keep in mind. That is also true if you are able to create the JVM-based HTTP server for example. We are using Akka HTTP. So we create the Akka HTTP instance all the main methods from there. But after it is started, we register it to the location. Again, it is a local copy. That will not be true if I'm starting Redis. Because Redis is an external processor. So to bridge that, you have to create an agent, JVM-based agent that will run within the process. And on behalf of Redis, you have to register and keep track of the process. So that is the utility that we are creating. So that's a full picture of how internal it works. And based on the discussion so far we can see or discuss some advantages we can debate. So the first one is that we are required to use Akka. That's the architectural choice. So on top of that if we had to work for anything else we would have to have another setup. For example, we require a setup of a consumer as well as they work around. Plus, what we get is a in-process library. Which is not possible in any of the approaches that we saw so far. They were external. So g-r2, g-r2, g-r5, g-r5 90% of our services are actors. And these actors have a very special way of doing serialization deep simulation. It is as simple as string serialization deep simulation because they have a string besides it. And that is available out of the box if you are using Akka and Akka is good at data. So we don't have to write any custom acoustic simulation for the actor. That was a good benefit. Each of the actors whether it is an R-Lang or Akka can be watched for the death. And if there is a death then I get an event. Well based on that event in the cluster can be unregistered. Which is a death watch and then doing a house duty call. It is possible because each actor has to watch a movie. That is like a edit event. It may be constant, but this is direct. And because the replicated copy of CRDT is present on all the instances the most common operation that we do which is a lookup is basically very, very far. It's like an in-memory call. So when we lookup we have... So CRDT is allowing you to do consistency for example. It doesn't matter if it is like scale data because then my call will fail and that is fine. It is to an extent that each call that we make to remote goes first with a lookup and then call. And that is a no cost to us. So that pattern we could not be doing because it's going to take few milliseconds 200 milliseconds. It will cache it for a while and then it will... And the last is that the watch operation subscription is a silly push bit. There is no long pulling involved. I just register my interest and the whole actually this is not even there. So I keep getting the update if the service goes off without having to keep the connection dedicated to it. But there are challenges and that is first challenge arises because we are using two CRDT. And there is no guarantee of atomicity across the CRDT update. Which means I am successful in updating the first one and then I fail for the second one. It's a partial success. There is no way to have it more than that. So what do we do? Well, the work around was simpler for us because we just error out. We return error response to the client which is a hint that I try again. And when client tries again that is perfectly fine because registering operation is important. So from the same post you can register in number of times. So, which means that they can start for the 10th attempt they will be successful. Well, it's not foolproof if someone wants to analyze user and trying to break the system. But given that it's an internal application I think that's a good enough correct. And in future, if a CRDT allows us to watch events only for a given key in a map then maybe we don't need to wait just to do it. The second thing is the whole thing is tied to the cracker. Which means you have to know how to do trap a cracker which any way is possible. Which means that you have to know what happens when you go thrashing without cleaning up. There is no way to know whether it is unreachable temporarily or it is really down. Well, you can know as an external party but as a member of a cluster you can never be sure because you never know whether it is because of a network partition or this is genuinely down. And if you try to make your own decision in the cluster then it can give right to split brain scenario which is common on all such non-marketing problems. So the usual recommendation is that avoid making mechanisms of calling something down. But we did not listen to that advice and what we do is actually we just if for 10 seconds the member is not accessible we call it down. And we don't care even if there is a split brain we don't care because we are using CRDT so if the partition heals then CRDT will converge. But that is not the main reason. The main reason is that if there is a network partition this is a telescope system. There are much bigger problems to solve on a level and it will not be functional just because our software is not working is the least of the problem solution. It is very unusual to go with automatic downing but we have found that. So one of the challenges then I have a small demo. How much time do we have? So I am going to talk of registration registration and then subscription which means that let's create two applications So I am going to start these applications I am going to start these applications So the service what is it doing? It is just subscribing to a name and watching basically events of registration and then service I am basically this is how the API looks you get a handle in memory just like the API calls it so this is all the local they are not doing and then what I do in the other process is that first I register an actor called A after some time I register it and then after some time I register on the same name I register an actor called B so I should see three events so the first one is a listener and the second one is actually doing the job so once they will form the cluster I have disabled the log and now they start seeing the first second so this is a simple demo slightly more interesting demo looking at 100 of them just to give you a feel of the performance for example I am watching instead of one name I am watching 100 and unregistering against one name I am registering and unregistering against 100 to give you a visual impression I am introducing some delay there but that is for the demo so how would this look like you can see that in less than 200 seconds the batches of updates will be transmitted across the cluster so you would not even notice the difference well it is not real time CRGD updates are propagated in batches of but that is completely accurate so to go to the statistics and this is registering right now and after it is done and you can do this comparatively it is CRGD updates so about the team I think some team members are here we are doing main development but the project of is describing the requirement of the language is the Government of India channel and these are the links for it is an open source work so whatever we have done is on GitHub it is also very documented because we are the live view provider and some other device so we are trying to document it and location service is just one of the services we are building we are building a lot of things backed by SVN so DNS service can provide you service discovery via DNS DNS 2782 records so SRV records so it is a standard you can say in many of the service discovery mechanisms trying to provide a standard way if we had one with concern maybe we would have but we could use this as a lab way if you need to for example we extract this as a general purpose service discovery then probably we if we want our library to be used by any project then we have to come up with some standard that is the question console also provides something called network tomography how many instances of your service any given in service do you have exactly so then you probably won't need that so much because network tomography gives you between nodes which is part of the service protocol so I think I very cleverly just talked of ad on the set because it is easy to explain because Vivo is not what the world but I will give you as the artist of the this is ad this is a notion of food stones it is a very technical I won't pull your chance but I will give you an intuition because those counters are for example instead of having just one set if you have two sets one where you keep on the addition one where you keep and then whenever you want to list then you have to do a taper and give you that if we give you some of that one well it will be not usable because then once entry is removed it can never be added so it is not usable but you get the picture of the mechanism to come up with very smart ways of implementing that and still giving you the guarantee it is our register sitting as a value inside a map so any operation that we do on that map is atomic that is fine but it so happens that we have to keep that register outside also just for the tracking need and then we are actually dealing with and the moment we wrap into a new one I won't be able to track just the one value that is the limitation of the implementation right now but yes once we compose them then that will solve the problem Thank you for the talk you are using a last register and map how do you determine what is the last value if there are two concrete updates to the same element how do you determine what is the last It is not ideal if you just think from there it is like very crude right but the time stamps are not reliable but then imagine that in our use case what is going to happen is one SCD by design will be started on so we expect that no other machine is trying to start that and now assume that by some mistake two machines are trying to start the same device jam any which wins we don't care will you because they are studying the same product if they do mischief if they say that against Galeel I am giving you a device driver and then someone is giving me against Galeel giving me something else that is a mischief and that can happen but then we are okay with that but yes it is a good question in general last night that too based on the time stamp is not a good strategy if you want to bank things are going more distributed CRD joins they join officially itself giving this kind of library they are also giving we are using we did not write our CRD service I thought that you combine both your stuff so we use the CRD library which Akka avoids and then we implemented a service discovery API on top of it and our talk is about how cool it is so yeah but Akka is also giving a more service like our opening service because they are also having node failing stuff they are working on it if you want the closest analogies the question is that is Akka cluster using the same mechanism for doing service discovery then that's a very interesting question because Akka type if you are following the type Akka type Akka type doesn't have so far a notion of a remote actor so which means that it has local actor or it has a clustered actor which means to discover a local actor well have a reference to discover a remote or a clustered actor so any mechanism is service discovery and they are providing a service discovery mechanism how inbuilt at part of Akka type but it will work only with actor actor and that mechanism they call receptionist so if you just look at Akka type receptionist you will see CRD is being used for a very similar reason and it came much later after that and there is another so we heard from Lightbridge that Akka type