 So welcome. My name is Unmesh and my colleague Saloni. And we have been involved for the last couple of years into this very interesting project called 30 Meter Telescope. We are building some software for it. And we'll be talking about some of our learnings and some details of how Actor Model was well suited for developing software system for this telescope. We are both from ThoughtWorks. How many of you are from ThoughtWorks today? Oh, just one. That's good, actually, because. So we'll talk in the first part briefly about what TMT is and the overall suite of software systems that we are building for TMT. And then Saloni will talk specifically about typed actors and how we use that to build the common framework for TMT. So TMT stands for 30 Meter Telescope. And it's one of the largest optical telescopes that are getting built in the world. There are other two. One is getting built by European Union. This one is these five countries, like US, Canada, China, Japan, and India. These five countries are collaborating to build this telescope. This will be operational in 2027. So nine more years to go. If everything goes on schedule, maybe more. And I will talk briefly about program summary. The timelines are pretty staggering for us enterprise developers. And there is a lot of, so if you want to get more information, this TMT.org, there is a lot of information available there, and you get regular updates on the project and the timelines and what is the objectives of building this telescope. Now, just to give some sense of how big the telescope is going to be, so this is the diagram I took from Wikipedia. Just to compare the mirror sizes of all the optical telescopes in the world. So in the right corner here is the baseball ground. And this is the tennis court. So in comparison to a baseball ground and a tennis court, these are the mirror sizes. And on the extreme right of this, the top one is 30 meter telescope. So that's the diameter of the primary mirror of the telescope. The second one, it's bigger than that. It's the one getting built by European Union. And then there is this third one, which is the giant megaline telescope, which is slightly smaller than that. So these three, if you see, that's the size of just the mirror. So it's bigger than this room. And you can imagine if the mirror is so huge, you will need a structure that's a lot bigger. So the whole structure, I mean, you see this diagram here. It's like a multi-storey building hosting that mirror and all the control structures and instruments to control that mirror. So that's the thing that's getting built. And as you can imagine, to build something this big, now 30 meter, the size of the mirror, no one has built that kind of mirror till date. So technologically, it's not possible. So if you see in this diagram, it's not a single mirror that they host. It's a mirror segment. It's more than 500 smaller mirror segments, giving you effect of a single huge mirror. And as of now, like 30 meter or so currently, the largest ellipse scope has 10 meter mirror. And 30 meter and 40 meter, I think, for the European Union. That's a miracle at this point. I mean, no one has done these kind of things. So a quick program summary to get some context. It's a 20 or more than that year's long program to build. So started 2007, the telescope will be operational 2027. And it's expected to be operational for 50 more years after 2027. So that's an interesting thing to know, because then the software choices that you make, like from 2007 onwards, you need to make sure they are relevant 50 years after 2027, because someone will need to support them. Now, software development is primarily happening in India. India is the fifth partner in there. And we have been building prototypes for TMT since 2014 using ACCA. And just to see if actor model is a good fitment for this amount. Now, along with actor model, so actor will specifically be talking about actor model. But we are building a suite of software systems, which is called as Common Software for TMT. And it involves, as you can imagine, a discovery mechanism. And we talked about that service discovery mechanism last year in here. Then there is a mechanism to maintain software configuration for the telescope. There is an event service to get all the event and military data from different parts of the telescope. There is a timer API. That's the work that's going on. And it's very, very interesting. We'll probably talk about that next time. And then lastly, the framework. And when I say framework, there will be multiple subsystems that are getting built by various teams across these five countries. And what we are building is a core framework which everyone will be using for developing their systems. And it's based on typed actors. Now, we'll briefly talk about the architecture of the telescope itself so that we get more context into why actors are better suited. So it's a layered architecture, as you can imagine. And at layer zero, there are these actual hardware components or hardware systems which control the telescope. The level above it, these are more like device drivers. So for controlling the mirrors for telescope control system, you will have drivers for those motors to control them. And in our terminology, we are calling them as HCDs, hardware control demons. Then there is a layer for each subsystem. So there are various subsystems in the telescope. You will have device drivers for each of the devices. But to control the whole subsystem, like a laser control assembly or telescope control system, you have a subsystem level component. And that's called as the assembly layer. And then at the top to coordinate all these subsystems, there is something called as a sequencer, which will be interfaced with the operational software, like the software that will be used, which is essentially UI, which is used for submitting an observation to the telescope. So essentially, what happens is through the operational applications, an observation specifications will be submitted to the telescope. And then through all these layers, the telescope will be controlled. For example, if you need to move the telescope to a specific region of the sky and then take pictures, let's say, every 500 milliseconds and then adjust as the Earth moves, and all of those instructions will be sequenced through all these layers. Just to give a very specific example, and we'll give a demo of this later. Yeah. So we'll just go through a very specific example. Now, I talked about moving the telescope in the sky to observe a particular portion of the sky. And that's done by the telescope control subsystem, which is called as TCS. Now, there is a component which is responsible for managing that subsystem. And let's say it gets a command saying, move the telescope to coordinates 2035. I'm just some hypothetical value. Now, what it does is it manages a set of components. Now, this is not showing correctly, but there will be a set of assemblies that it manages, which are called as pro-bam assemblies. Pro-bam is a particular hardware subsystem which controls the motors. So it sends that command to this assembly. And this assembly component manages multiple motor devices. And so this command, it translates into the commands that are sent to these device drivers, which control the motor. And it will also listen to the events that are coming from those devices, and then adjust accordingly all these motors. And then these device drivers finally control the motor. So this is how the communication happens between all these components. Now, the key characteristics of this system and how it differs from, let's say, a typical services, et cetera, that we build in Enterprise is this is a peer-to-peer system. So components like these assemblies and HCDs, they are peers. There is no client server kind of a thing. They need to discover each other. And they need to send commands and subscribe to events. The communication between all these components is asynchronous. So it's not a synchronous thing that you send and wait for response. So you will subscribe to events from this. And the last but important thing is all these components are stateful. So each hardware demon or a device driver maintains a state of the hardware device. Each subsystem level component maintains a state of that subsystem. So each hardware device controlling a motor needs to know what the current position of that motor is. And as a subsystem, you will need to know the coordinates of that part of the telescope. And this makes it difficult to manage them because managing state, as you know, can be a nightmare. So concurrency and safety of that state are important concerns. Now, if you see actors and many of you have used actors, but these characteristics of building peer-to-peer system, they fit very well with the actor model. So actor, by very nature, the programming model is message passing. And actors do manage state. So actors do have their own state, which can be managed without synchronization, actually, because it is guaranteed that only one thread always accesses that state. Actors also have a very interesting fault tolerance mechanism called supervision. So you will always have a supervisor watching your given actor, and then you can have various supervising strategies, maybe to restart the actor if it fails or resume if it fails. And that's very critical in these kind of components which manage state, because if something fails, you have to make sure you always start in a clean state. So if a particular software component managing a motor fails, you need to make sure when you restart it, it has the correct position of that motor, and it's not corrupted because of the fault that happened. The last but important one is of location transparency. So with ACCA actors, you can access remote actors the same way you can access local actors with actor reference. And that's crucial, because all these hardware components and the demons and assemblies, they can be scattered across the cluster of machines. So you need to be able to access these actor references remotely without much hassle. So I talked about location transparency, but so essentially you can have remote actors that you can access with actor references and discover them. Now, discovery becomes an important concern. And people who have used microservices in the Docker world particularly, service discovery is a very, very important concern. So for actor references, we had to build something very similar for discovering all these actor references at runtime. I will briefly talk about how these things are deployed. I talked about the discovery and why it's needed. So essentially, these hardware control demons and all the software about that, it runs on JVM because we are using ACCA actors that are on JVM and on Linux, the JVM salon on Linux. So you have these Linux machines attached to specific hardware, and you will have these components deployed on these machines. Now, what we run on each of these machine is an agent. It's a JVM agent using ACCA. And these agents form a cluster within themselves. So all the hardware machines that we'll have, each one will have this agent running on them, and they will form a cluster. These agents expose service registration and discovery API. The implementation is based on CRDTs. It's conflict-free replicated data sets. And then each component actors, I mean, these will be running in their own JVMs. As they come up, they register themselves through a CDP API that these agents expose. We are also using server-send events because, as I said, this is a peer-to-peer system. And you need to know, I mean, all the components and these JVMs, they can start in any order. So you need to know if you are interested in a particular component, if that component is up or not. And for that, each component registers their interest in other components. And they get events every time the component is registered or its state changes. Now, this is very similar to if you are aware of console or its CD and how it works with Docker containers. It's very similar to that. So this is the overall system and what we are building. I'll just hand over to Saloni for going into specifics of how typed actors are used to build our framework. Thank you. Thank you, Mish. Am I audible? OK, we'll wait for, how is it now? Yeah, OK. So we need to understand anatomy of a TMT component. A component in TMT is a sequencer, assembly, or HCD, like Mish has already explained. So an assembly needs to discover a HCD, send commands, and handle the response. Assembly will do so by using the framework provided by us. Us means the team at ThoughtWorks Pune is developing this framework out of actors. And all the component writers, sequencer writers, assembly writers, and HCD writers will be using our framework to implement the actual code for components. So if I have to start with, Supervisor is something that gets created first. Supervisor is an actor provided by our framework. It is the first thing that gets created in a component. It acts as a front-facing of a component. And that's why any other component discovers this component via the supervisor. Supervisor will go on and spawn a top-level actor. This is, again, an actor provided by us. And top-level actor will go on and initialize handlers. Handlers is something that component writers will be writing. And handlers is a set of actions. The relationship between top-level actor and handlers is of a template method. So I'll explain what are handlers and what kind of actions I'm talking about. So this is the code base of handlers. This is an abstract class that is provided by us. Okay, just a second. Any better? So what? Okay. So as I was saying, component handlers is an abstract class that we are providing. And there are some actions like initialization of a component, validation of a command received, execution of a command, shutting down of a component. So handlers are a bunch of actions. And component writers will be extending our component handlers class. And we'll be providing the implementation of these actions. So if I go back to my presentation here, we have seen that handlers are a set of actions. Top-level actor will decide the sequence of executing these actions. For example, at the startup of a component, top-level actor will call the initialized action on handlers. So handlers as part of the initialization action can spawn one or more worker actors. Now worker actors are again actors that will be implemented by component writers. The job of the workers will be to execute the command received. So as part of this command execution, worker will require to publish certain events. For example, the current position on behalf of the component. And to do so, worker will be using an actor provided by our framework, which is a PubSub manager. This actor is responsible for bookkeeping the list of subscribers interested in events that this component is publishing. So whenever a worker publishes any event, PubSub manager will notify all the interested subscribers. And once the command execution is complete, worker needs to mark this completion somewhere. And to do so, worker will be using another actor provided by us. This is command response manager actor. This actor is responsible for bookkeeping all the subscribers that is interested in knowing the completion of the command that was getting executed in this component. So once the command is complete, worker will simply go ahead and mark the completion in command response manager actor. So this is pretty much the gist of what goes into the creation of a TMT component. Any questions so far? Yeah. So there is a supervision strategy that will come into play and I'll talk about that in this slide. So let's try to understand a supervisor a little bit more in detail. So the first thing that supervisor does is register itself with a Cassia DT. Next, all the communication to the component goes via supervisor. Since it is the front facing, all the messages are first received by the supervisor. And that's why it becomes a logical place to put all the filtering and validation that is required by a component. And supervisor is also responsible for handling the life cycle of the component. So let's try to understand this with this diagram over here. So let's say there is an exception that occurs at handlers here. So whenever there is an exception there, the exception will bubble up, it will go to top level actor and it will again bubble up till supervisor. Supervisor will apply the fault-olven strategy. It will restart the top level actor and it will bring the component to a clean slate. So once the top level actor is restarted, it will do this job of initializing the handlers and initialization happens again. If you notice here, after the restart happens, the actor reference of the top level actor changes because it is started again. But the supervisor remains same and it is least affected by this exception that was generated. And that's why it is preferable that the supervisor's actor reference is registered as the address of this component and not any other actors that we see over here in this diagram. And to answer your question, even if any exception occurs at any actor level, the same strategy is applied. Sorry, I didn't get that. Yes, those are actor messages. Across components, if actors restart, right? And you will get events through the agents. I mean the discovery mechanism that we just talked about. So we'll know that something is down and it's getting restarted and it republished as a separate reference now and each component will then update their copy of the reference. Yes, and we are fine with that. If a supervisor dies, then we have depth watch mechanism that is running in the cluster. So we will be having depth watch actors running on more than one nodes. It will figure out that a supervisor is down. It will do the update in the CRDT and every interested party will be notified, okay? Yeah, sure. I didn't get the question very well. Yeah. Yeah. They were told that the first time they were working out there. Yeah. Like if it's going back later, but there's a classic thing that's written off of the agent. Yeah. I would say that you die. Yeah. So you really die in the second place. So essentially for things like this, we need a lot of operational, people are new strategies that will work in most cases where a simple start of an actor, a big fight. More fundamental error is going to kill the day every time. We have an alarm mechanism. If it's happening more and more, or again and again, we'll leave an alarm and operate. I've got a hard time on some community channels. So it's not the kind of player I think. So what you describe here is just the basic super insight for performance. That's all you can ask. What should we do here? Yes. Yes. I think I should talk to you. I don't know what to do. Yeah. Yeah. I think that's very good. Yeah. Yeah. Yeah. Yes. Yes, yeah. So if I were to explain on Mish's point little bit, when if handler dies and it comes up again, it is going to read the default configuration that it wants to for hardware, and it will again, again, command the hardware to be in the default position. And that's why we don't need any, any state at actor level right now. Moreover, this use case over here that you are explaining is pretty much for initialization. So if something goes wrong, it is only at the initialization and nothing of command execution was still in place. So that has another mechanism of hardware reading as well as alarms and all come into place. Okay, so we saw that supervisor is registering itself with ACACI-RDT. It is being the front phase, hence it is having filtration and validation code. It helps in providing the restart strategy. It also provides me the admin interface that I want. So whatever mechanism that a system operator or a maintenance staff wants to do, like restarting a component or shutting down of a component, changing the log level, all these features are also implemented at supervisor level. Next, I thought I would put this slide thinking that if most of us don't know what actor is, but since we know what actors are here in the audience, I'll just quickly go through this. Actor is a unit of message execution in a synchronized manner. One can send messages to an actor and these messages are saved in a queue. So actor picks this messages one by one from the queue in the fee for order and it also gives me third safety while managing the state. This is what our idea is when it comes to actors. So then I'll talk about the role of typed actors in TMT. So typed actors is something that gives me an additional information of type when I'm defining the actor and this type is the set of messages that an actor will understand in its lifespan. So actors was helping me in defining the communication protocol between components in TMT, but typed actors make this communication protocol explicit at compile time. So if I define a HCD of type A, that means HCD understands messages of type A and if assembly tries to send message of type B, then assembly is going to get a compile time error. So this is what typed actors give me, a compile time safety. And this typed actors is available in two flavors. One is mutable and one is immutable. So the mutable flavor allows me to update the state of the actor and also giving me the third safety, plain old actors that we have known. But the immutable version will enforce me to create a new behavior of an actor whenever I want to update the state. And currently we are using the mutable version of the typed actors. So if I show this in diagram, all the actors in the blue which is provided by our framework, currently all of them are mutable actors. And as we have already seen, we also register the typed actors in Akasia DT. So we have understood the usages of typed actors in TMT. With this understanding, let's try to see what are our learnings of using typed actors for more than one year now. So the first one is that sealed messages for typed actors has turned out to be quite rigid. So if I have to explain the type that a supervisor is going to understand, for an HCD, a supervisor needs to understand all the messages that assembly and other entities in the external world it is going to send. So supervisor needs to know external world entities' messages. Supervisor also needs to understand messages which are sent by the actors which are running internally in the component. So this is another set of messages that a supervisor needs to understand. And apart from these two set of messages, there is one more set of messages based on the state of a component which means that if HCD starts in idle state or if it is in the restarting state and assembly tries to execute a command on HCD, HCD is simply going to discard it. So right now supervisor has external entity set of messages, internal messages and state-based messages. And since we are using the sealed hierarchies in defining messages, I end up putting all these three set of messages in a single file. And this single file has turned out to be quite large and a headache for maintenance. And that's why we feel it's quite a tech dot there. And what could be the solution there? So why not have a union type support in language? So this will allow me to put all the three sets of messages in their separate respective files. And when I'll go and try to define supervisor, I'll define with the definition that external messages and internal messages and state-based messages. And I'll get the modularity that I'm aiming for. The next one is that we are using mutable behavior of actors because it is similar to the normal actors that we have been using so far. It is similar to Pojo classes in Java. But then we have used mutable actors for more than one year now. And we feel that immutable actors are good enough. It is going to give us everything that we want. And next thing is type information of actor reference is not preserved during serialization. I'll explain this point quite in detail. So assembly needs to discover HCD. And that's why HCD will register the supervisor's actor reference in ACA-CRDT. At the point of registration, the actor reference gets serialized in ACA-CRDT. And at serialization point, it is only the string representation of actor reference that is captured here. But the type of the actor is not captured there. And it is bypassed. So when assembly tries to discover HCD and tries to deserialize it, assembly will have to manually cast the type of HCD's actor. So if HCD's supervisor understands messages of type A, assembly will have to manually cast it to type A at the point of deserialization. And if assembly makes a mistake and cast it to B, there is no error that gets thrown at the point of deserialization. It is only when assembly sends the first message to HCD, HCD will throw an error saying that you have sent me a wrong message type of B. I can only understand the message type of A. So this is quite late in terms of knowing this problem in the whole process. So what could be done to make this a less risky one? So why not have the type also get captured while the supervisor's actor reference is getting serialized? So if the type is getting captured there, then at the point of deserialization, I can determine what is the type and it will be less risky compared to what we have right now. So just to summarize my learnings here, we have sealed message hierarchies. And that is turning out to be quite a large file for us. We are using mutable actors, which is very good for us, but immutable actors are also very good right now. And the type information in actor reference is right now not getting captured anywhere. Okay. So Unmesh explained a little bit of this diagram while he was talking. And I'll try to demonstrate the same. We have three blocks here. One is TCS assembly, other is PROBAM assembly, and one more is PROB-HCD. PROB-HCD is a hardware control daemon which is close to hardware. It is responsible for talking to hardware. PROBAM assembly is responsible for talking to PROB-HCD. And TCS assembly is responsible for controlling and managing the positioning of the hardware in a telescope system. And that's why TCS means telescope control system over here. So let me try to explain the timers that are shown in the diagram here. The bottom one is the hardware level timer, which is responsible for ticking at a certain frequency. And at that point of time hardware will publish its current position. So all the software and all the assemblies interested can listen to the hardware positions and can determine their tasks. The timer at the top part is for TCS assembly. TCS assembly uses this timer to generate the messages at a certain frequency so that it does not bombard a lot of messages at the same time to PROBAM assembly. So with this understanding let's try to see how a typical use case in TMT works. Let's say first the timer of TCS assembly goes off. It produces some coordinates. These coordinates are sent to PROBAM assembly. PROBAM assembly will simply update its demand state. It will just wait there. When the timer of the PROB-HCD goes, it will generate the current position of the hardware and it will send this current position to PROBAM assembly. PROBAM assembly will simply update the current state. And when PROBAM assembly knows the current state and demand state, it can now calculate the next move for PROB-HCD. So it will calculate 1025 as the next move. It will send this next move to PROB-HCD and PROB-HCD will just update its next move to 1025. PROB-HCD will do nothing and just sit there. It will wait for the timer to go again next time. When the timer goes again, it will use this next move 1025 to command the hardware and be at that position. It will generate the next position. It will publish to PROBAM assembly. This is the typical use case of what we see in TMT. I'll try to also demo that here. As you can see here, we have four screens available. A cluster seed, a TCS assembly and PROBAM assembly, PROB-HCD as well. So cluster seed is something that we need to start first. It is going to be our introducer in the ACA cluster. Is it visible? I'll try that. I have the time checks so I can't spend more time here. The first thing that I'm doing is starting the cluster seed. Trust me, it has started successfully. Next thing is that I have to change the collage here. Not working well. I'll start that again. They will register to that agent. So agents need to be part of the cluster. Other components need to be part of it. But they will have to stop collaging. All separate agents. So agents will form their own cluster. So within for each assembly, HCD and all these components there will be a supervisor. So each component has its own supervisor. Try one more time. If it doesn't work, you have to believe it works. Because it worked just before the presentation. It never works anymore. Yeah, even after the rehearsal and preparation. I think your network is not working. No, no, it is working. But somehow, let me try to do this. It should probably work now. Yeah, there you go. Yay, we have a successful registration of TCS assembly. And assembly has a sticker going on. So far we are not plugging it right now. Because it was flooding our screen. Next thing I'll start the probe at HCD. Oh yes, it is all on simulation. So HCD is also registered successfully. Once I'll start the assembly, it will start receiving the current state and the demand state from TCS assembly and probe at HCD. So if I were to just see, it received the 1025 from TCS assembly and it is sending the current state to probe at HCD. So if you see here, probe at HCD is receiving continuously a move command. So go to the next position here. That's all for the demo here. Go back to my presentation. So I will briefly talk about our team here. So we work basically from Pune Office ThoughtWorks. This is our India team. We have one more team working from Indian Institute of Aerophysics from Bangalore and our head office is in Pasadena, US. Our code is open source. Feel free to go and check and play around with it. Hope you enjoyed our session. Thank you.