 So thank you for being here. So today we're going to talk about the C-PASS project, which is one project under the LF Energy Embraer. And it's a configuration project. So just let me present myself first. I'm Maurelian Watai. I'm working for RT, the French TSO. If you don't know what it is, it is the entity that is in charge of the transmission of the energy in France. And my colleague Florent Caly, which is an IT expert at RT. So a bit of context. So the grid used to be the same for the last 30 years. It was pretty stable. We had nuclear power plants and the same substation and everything was fine. But the world is changing and we move toward renewable energy and low carbon impact. So there is more and more renewable energy sources on the grid. And as a TSO, we need to integrate those new sources of energy in our infrastructure. And the place to do so is within the substation. This is the key connection point for renewable energy sources. For those who don't know, substation, this is the small brain, the local brain, where you can connect different energy sources and when you can manage and distribute power. So you have a lot of common and control functions and also high voltage equipment and protections. So in the substation, that's where you have the SCADA, so System Control Data Acquisition Protection. And as I was saying before, the traditional system was built for stability and longevity to work with nuclear power plants. So basically, in each substation in the whole country, it was the same and it was very standardized and nothing changed. But with renewable energy sources and solar panels, there are new requirements and it's evolving all the time. So we need to be way more flexible. So we need to move from a traditional approach that you can see on the left with a very fixed control and command system that is built to stay the same for years to something that is way more flexible. And that's what drives us to virtualization because we needed to benefit from the OT world and also from the IT world to be able to move fast in this new environment. So the goal of the C-PASS project is to build the backbone of this new system. So it's to build the RT virtualization platform on which you will be able to host all the virtualized critical control system applications. So let's talk about the platform. It's not we don't code anything. The project is an integration project. Everything already exists from the hypervisor, KIVM, preempt RT for Linux and real-time functions, pacemaker, corrosion, CF for the clusterization and OVS for the network. So the goal of the project is to allow the user to build a ready to run VM cluster. And the other constraint that we have is that in common and control, it's always multi-vendor. For safety reasons, you cannot have, for example, a protection from one vendor and the main B from the same vendor. So you have always a vendor A and a vendor B. So that's what we talk about, virtual machine, because since you have different vendor, it's easier to have a virtual machine that, for example, containers because they have to maintain their own environment. So the key requirement for the platform are virtualization, of course, real-time performance. Because if we talk about distance protection, so the protection that will protect the lines, it works in real-time. So when we talk about real-time the acquisition is around 250 microseconds. And the time of operation is around one millisecond to three milliseconds. You need high availability. Of course, if you lose one server, you need to be able to, you need to keep to the protection on the system. And you rely a lot on time synchronization. And you also need good networking performances because there are lots of data in a substation. So now I will let Florent talk more in detail about the platform. Yes, let's dive into the meat of the project. Once again, everything already exists. And the C-Past strategy is to integrate and to provide code to configure everything very easily. But we are not coding any of the of the function. We are relying on already existing and stable software and code. A deployed cluster might look like something like this with several hypervisors. You must have an uneven number of machine for core purposes, but you can have only two hypervisors or three or five or whatever. And you are using a core sync and pacemaker to do that. We will give you the opportunity to replicate data from one machine to another. And the VMs will have their disk stored on the distributed storage. If the VM has to move because of a problem that needs to be moved on the other side, then it will find its data because it's been replicated. So basically, the technologies we are using, we're using either Yocto or Debian, then a lot of Ansible because we are using infrastructure as code and everything we are scripting to integrate. Those breaks is done with Ansible. We are doing tests and continuous integration with GitHub actions. This is the out of the box software. In the cluster, we are using, like I said, real-time Linux, KVM, QMU, core sync, pacemaker, Ceph, OpenV-Switch for all the inside networking for the VM to be able to communicate with each other and with the outside. So basically, you have a view of the technologies we are using. And a lot of other out of the box technologies, open flows, if you want to do some cybersecurity between the VMs, XDP, SROV, this is for the network performance, stone it for the cluster stability, other stuff and other custom tools. We will get into that a bit later. So, let's review the prerequisite. We need the platform to be real-time. In our case, we chose to use preempt RT. Like many of you know, it's not hard real-time, but it's real-time enough for what we need to do. And it's getting traction. It's getting in the mainline kernel now, almost entirely. There are some things to be done, but it's no problem for us to patch or to use an already RT kernel, like the Debian kernel, for which they have a real-time version. Then we are using the preempt RT kernel, and then we are using all sorts of methods to do resource isolation. On the CPU, for instance, we use either dedicated cores or we can do isolation via ISO CPU, CPU sets as a new C-group way of doing that. CPU pinning for the VM, priority scheduling, like FIFO scheduling with different priorities. We use a lot of the control groups, and everything Linux has to offer to make your isolation better, basically. On the virtualization side, we've been talking a lot about Xen in this conference. We have chosen KVM at the time of the choice. KVM had a better reputation for low latency. Oh, with preempt RT anyway, KVM was supposed to be better than Xen, and we chose VM of our container, like Orien said, because of mainly the multi-vendor constraints. We need to have different vendors running applications on this cluster. VM makes it very easy. We have a shared responsibility model. If it's in the VM, it's your problem. If it's not in the VM, it's our problem. It's more versatile. If you want to provide a function running Windows, you can. If it's a container, it would be a bit different. It's easier for resource management. It's very easy to give a CPU to a VM and to increase this number because you need a CPU, one more CPU for the VM. Basically, right now, using only containerization was not an option, and VM made things very, very easy. And the second thing we noted with virtualization, and that's the meat of the project. I mean, people are doing virtualization. Many people are doing virtualization, everybody's doing that, and many people are doing pre-empty. Doing pre-empty with virtualization is still kind of new. KVMRT is still kind of something not seen very much. We noticed it being getting traction. For instance, we will be next month in Prague, and we've noticed there is a disconference. So we see other people investigating pre-empty in the virtualized environment. So we believe in it, and we think we're on the right way to to work on it. About resilience, once again, we are not developing anything. We are using caressing and pacemaker, and they can do very nice clusters. We are using CEP as a second layer for the storage. And on the network side, we also support PRP, which in our electric grid environment is almost a norm. That is to be sure. If you don't know PRP, basically every packet is duplicated on the network. They can use different ways. And in the end, it's de-duplicated. So even if there is any kind of problem, you won't lose a packet. So it can be implemented hardware. So you can buy a hardware card that already implemented the PRP on a hardware level. So this is something we had to support in CEPAS. Before going into IT stuff, networking, there is two types of networking in CEPAS. The first networking feature is the VM. When you implement a full system with different VM that needs to talk to each other, you need to have a very versatile networking stack. OpenVswitch provides this. You can create bridge, you can plug the VM, you can have firewalls if you want, and you can redesign, like, do software design, software defined network in your hypervisuals. You can use the XLAN, VXLAN, so that you can even extend virtual bridges from one CEPASer to another. So basically, we can do everything we need with OpenVswitch. It's very nice. And this is, once again, for the VM communications. But when the VM needs to exchange data with the external world, then we support SRIOV, which allows a VM to have very good performance on network hardware, because it's bypassing the hypervisor. And basically, the VM can have access to the NIC with no overhead. So this is something we support, and we think we try different things. We also try DPTK, for instance. In the end, for the external communication, we think SRIOV will be the target. PTP, like Orien said, in the electric grid, but it may be somewhere else. In our world, everything needs to be synchronized with PTPs, precision time protocol. Our way for, so this is a standard way of using PTP in a substation. There is a whole architecture, a whole way of doing that. We can have transparent clocks and boundary clocks and GPS and everything. The idea, when you virtualize, it's kind of different because the norm, like it's been written, has been designed for hardware. So the hardware is supposed to be synchronized with PTP. When it's in a VM, you can't, or at least it's not optimal anymore, to synchronize every VM in PTP. So the architecture we chose to implement is to use Linux PTP, of course, but to synchronize the host, the hypervisor with PTP. We don't synchronize the VM. And then we use PTPKVM, another way of doing that, to synchronize the guest with the host. So we never synchronize the guest directly in PTP, but we are confident enough that the guest will be synchronized very precisely. Once again, enough for our use case. So how do we deploy that? So for the people who were just before, we had a nice presentation on Yocto. Yocto was the first approach we chose, because we were in this industrial environment and building environment. Yocto has many advantages, the deatability, the reproducible builds, the features, it can help you for the S-Bom problematic, minimal footprint. You can design a very small system with very limited footprint. In our context, since we are using a lot of software, and anyway, we need to configure it to create clusters and everything, Yocto wasn't required anymore. We are not creating, for instance, we are not using images, because if we have three machines, for instance, they are all different. They are not using the same image. We need to run a playbook, an Ansible playbook, we need to exchange keys, we need to configure the cluster. So since we need to configure the cluster with Ansible, we ask ourselves the question, if we need to configure the machine, why don't we use the standard distribution and then configure everything with Ansible? That's the second way we worked on, which is the Debian approach. So basically now, we are creating a basic installation media with an FAI project, which is a very nice way to create your Debian ISO software. We only do basic customization on this image. So basically, we deploy a Debian system. The only thing we do with the FAI is that we take all the software we need in the installation media, for instance, a safe open-v-switch and everything, and so we can deploy the machine with no internet connection. So we create our USB key, for instance, with all the software we need, and we can deploy all our C-pass machine with one USB key and no internet connection. Once we have done that, basically we have a hardware running Debian, but it's not much. Everything else is done using Ansible and playbooks. So we connect to the machine and then we do everything we have to do. We do all the prerequisites, we configure the system, we do the cybersecurity, we create the clusters and everything. So this is the Debian approach. Just so you know, on the C-pass project, both branches are still maintained and exist. So customers can use the Yocto branch and those who are willing to switch to Debian can also use Debian branch. I myself am working on the Debian branch, so I know the Debian branch a little bit more. And what was that? Yeah, cybersecurity. So yeah, this conference is a lot about cybersecurity. So how do we manage cybersecurity in this project? So basically, since we are running a standard Linux system with a standard IT software like clustering and everything, basically we used the NC, which is a French cybersecurity agency, and there is this configuration of a Linux system, which is 70 pages of best practices on how to secure your Linux system. So basically, we implemented that. And once again, we implemented that with NCBOOL. So you can run the playbook and it will harden your system. You can run a rollback playbook and it will harden your system. And it's very easy to update, to add a rule or to use to do compliance. Is my system still like I deployed it? I rerun the playbook and then I know NCBOOL tells me if everything is still as before or some things have changed. The other way on the Debian branch we have for cybersecurity is to basically delegate a lot of work to the Debian community. When you install Debian, you know your system is updated. You know you can do updates. And basically, this is not CPAS project problem anymore. We just have to provide a way to update the system. But if there is a patch to be installed, then APT updates. And it's working. So we found that it was a bit easier than Yocto. Yocto is to install a patch. It's a lot more complicated. You have to rebuild the image and then to deploy the image and then to switch. You have to have rollback processes. And yeah, we think it's also in the interest of cybersecurity to move to Debian. So now, because it's smaller than a book, it's not just an R&D project where we want to try things and make them work. We just so you know, RT will be using CPAS in production in a few months. So we have to integrate it in our environment. And so we have to provide non-unbedded solution. Like for instance, how do you back up to your VM? How do you export your logs? How do you check the CPU if it's running at 100%? The administration? How to do I don't know, cybersecurity profiles or whatever. So everything we call IT tooling. These things that IT needs when they need to to administrate the system. And so I'm going to show you in a few slides what we did for IT tooling. First thing is implement SNMP, which is a standard monitoring protocol. So like you said, like you see, we have many indicators. We know the load, the ping, the temperatures, the interface status, the disk, the NTP status, whatever. And basically when you deploy, at least CPAS starts, it's very easy to pull the SNMP status and to recover, to know of any alerts. For instance, if a VM has crashed, if your provider is not synchronized anymore. So this is the first thing. Second thing in logs is logs. We are able to export our logs to any log operator. For instance, this is a STQ, you can use Splank, you could use the standard SysLog server. So exporting your log is also something that we had to implement to make our system acceptable. VM backup, very useful. It can be, we can imagine in situation in which you upgrade a VM, it goes wrong. How do you roll back? How do you export your backup off site? So once again, we developed on the CPAS project many tools to be able to do this VM backup to provide the service and to restore VM. We've used this a lot in development. For instance, when we want to clone a cluster, we backup all the VM and we restore them in another cluster. Works like a charm. For this BOM problematic, since once again we wanted to be able to provide what's running, what are our dependencies on other software, since we are using Debian, it's pretty easy. DPKG can list every package you have in the system and all the versions. And with this implementation that once again, we didn't code. We used it, it exists. It's very easy to have a list of all the packages you use, all the software, all the versions and all the licenses you use. So it's a first step toward SBOM. It's not enough. The project is young. It's a very young project. We've been migrating to Debian not even a year ago. So it's a very young project. So we are trying to go in all directions and to solve all problems. We know we're not done, but we want to code first and then solicitate the community, you for instance, to help us finish or change things or add stuff. That's the idea of our presence here also. We also developed many tools for VM management because when you want to create a VM, move a VM, stop a VM, duplicate a VM clone, create a snapshot, everything. So we developed tools to make it easy so that you don't have to go SSH on the server and run a pacemaker or libvirt commands. So once again, to make it more easy to onboard the project. So I hope I gave you a sense of all the technologies that are used. Once again, it's not a product. It's more of an integration of everything. But the idea is that you can deploy, so now you understand, you take your USB key, you deploy Debian on the server, you run a few playbooks and your cluster is up and running and can run low latency VM. Okay, thank you, Florent. So as we said before, this project is going to be used in production and for control and command system, we have protection that are critical applications. So of course we need validation strategies and it's basically it's based on the continuous integration and continuous testing. So we use the same approach as before. So as a test as code, so basically as Florent told you before, everything is based on uncivil and playbooks. So we unfazed on automation, continuous testing and place book for the testing process. So for example, if you want to test one word security requirements, you will add the security with an uncivil playbook and you will test the security with a test that is launched with an uncivil playbook as well. And as it is an embedded project, you need to test directly on the targets. So the approach is when you do the continuous integration, you deploy your code, you deploy everything each time on the target. So there are two levels of testing. The first one is, I would say Linux oriented. So we don't unfazed on the command and control, we just unfazed on the Linux platform. So for this, each time there is a pull request on the project, we re-install CPath on the real cluster in our laboratory. And then we have maybe today 2,000 tests that are run each time. And each time there is a new functionality, we add a new test. And we generate a test report to assess for changes. And this way we ensure the reliability and the performance of the platform. So with this intensive testing, we are going to test, like I say, cyber security, there are tests regarding the cluster features, there are tests regarding the latencies. So yeah, we try to test each items. And if you have any ideas regarding those tests, feel free to contact us. We can add tests to this. And the second layer is what I call the factory acceptance test approach. So in this, we use a very common approach that we use for command and control, what we call HIL, so hardware in the loop. So we have simulator, real time simulator, that will simulate the high voltage network, so like the production, the solar panel or whatever. And so we will connect our system to this simulator. The system will think, okay, it's like I'm in the field, I'm connected to a real high voltage equipment. And so in this, and then we can test its behavior. So basically, the test that we can do is simulate a fault on the line and see whether the system reacts at the right speed. And then with this kind of environment that we also call sometimes digital twins, you can test all the, you can test the loss of, for example, synchronization, the loss of the clock, anything that you want. And with those two layers of tests, I think we get a good base to have a good coverage for this approach. So what next? As we said before, CPATH is an open source project. And if we want to, what we realize is most of the people from the control and command are not specialists in Linux. That's not their business. What they want is a product that they can use and build their control and command on top of that. So the first thing that we'll do is encourage, for example, as RT, we do our part, we did our part in the research and development, but that's not our business. Our business is to run a substation. So we will encourage companies to provide support on CPATH. The target is to have a subscription model and then we subscribe to it and then we can have our services. The second point is enhancing the factory effect approach for knowledge, sharing and assisting utilities and vendors. So the idea is to, we have the chance to have a very, very nice labs and I would say lots of histories, so lots of tests. So with virtualization, it's pretty easy to rerun tests and tests and tests. And I think that's the only way to, for everybody to accept virtualization. The only way to do it is to run dozens of tests to have the system to run for one month, one year, two years without interruption on this platform and to show that it's reliable. So basically more and more testing. As I said, CPATH is not a NAND project. It's a way to build your platform. So we need input on resilience, critical, what is your approach on building your infrastructure? Are you using main B approach, application level failover, other ideas? Because the idea of the project is to integrate all those features, then after you can do your own integration. If I may add, the thing is there is an architecture issue that we haven't solved yet. The resilience that we use for instance, SpaceMaker and Core Sync, if a VM crashes, or if an hypervisor crashes, then SpaceMaker will say, okay, I need to restart the VMs on the one that is not crashed, fine. But it will take like 30 seconds to restart a VM. So on one side, we host critical VMs and they have to react in a few microseconds. And on the other side, if an hypervisor crashes, it would take 30 seconds to restart the VM. That is not compatible. So the way we do is to split the VM into two categories. Either they are not critical, SCADA for instance, it's not critical if you lose observability for 30 seconds. That's a big problem. So this one, you can just use the standard resilience, have only one VM. If it crashes, it moves, you lose 30 seconds, no big deal. But for protection, for instance, you can't lose a protection for 30 seconds. So what we use is a main B. We have two VMs, basically. Two VMs running on two hypervisors and see if one crashes. Then the VM that is crashed, we always start on the hypervisor three, for instance, but you will still have main B running. And so it won't be a problem. The thing is, in that case, the resilience is just a nice to have way to reduce your degraded mode time. And so we're soliciting the community in that case to say, how would you do it? How would you implement resilience of critical function? Because technically, pacemaker is not good enough. We thought about main A main B. We thought about an application level failover, for instance. There might be other ideas. And maybe you want to explain, yeah, that's how we work. Maybe you want to say something about this. I think you did it pretty well. Basically, we also stuck with the architecture of common and control. Basically, you cannot put two skaters at the same time because the rest of the architecture will not support it. So you have to have only one skater running at the time. However, for protection, since it was a critical application since the beginning, the architecture is made to be able to have two protection in parallel at the same time. So yeah. So yeah, we are welcoming ideas. A few words about the project before we move to question if you have any. Yeah, so it's an active project. There are several companies working on it. We have a website, of course. We have a wiki. We have a GitHub with many repositories. We have a Slack channel. So basically, yeah, come talk with us. We're always there. And just so you know our planning right now in Vancouver next month will be in Paris at the beginning of June. We'll be in Prague at the end of June. So yeah, like you say, as you can see, it's a project that is going very, very active. Thank you. Thank you. And if you have any question, of course, we're there for still a few minutes. I'm just curious, so you're doing DMs, why Docker's up there, what role it plays? Sorry, I do. Oh, yeah, it's more like for the CI. Yeah, we're running when we have to build, when we have to run Ansible to deploy the cluster, for instance, the GitHub action is building a continuous integration image. It's building the image because it needs Ansible and it needs all the playbooks and everything and then it runs the cluster. And so that's why it's, it's throwable. It's just for the CI or for the test and then we throw it away. So Docker is very nice of that. To complete on that, that's, we say that the VM is basically the container. But inside the VM, most of the time you have container. Yeah, of course. So yeah, the VM there. This is a vendor, for instance, if a vendor wants to provide a protection, if it wants to, he can provide a, like I said, any OS, we don't look into it. But often what we imagine is that one vendor will provide several protection, one network stack, and usually it will, it will provide a VM, but in the VM it will be many containers running. So yeah, we encourage vendors or providers of VM to use Docker because it's very, it's more easy to do upgrades. I'm curious, how is the name CPath chosen? It doesn't make sense. No, it's an acronym. What does it stand for? I just like to say, I'm curious. What's the acronym for? I didn't choose it. You didn't choose it? Anyone, anybody remembers? I don't know. It's within. Yeah, tell me. We don't use it much, and it might be a little stretched. Any other question? What is your observed drift with PTP KVM? And what is your tolerance for that? And what actions would you take if it drifts too far out on the constituent VMs? I'm not sure we observe a drift because since PTP is working all the time, it's always synchronizing. So if it is, the order is a dozen nanoseconds, that kind of delay. If I may add something, basically, if you do the PTP synchronization directly in the VM with the network, you won't have hard time stamping. So basically, the precision that you get is lower than the one that you get directly with the host. And it's not even compatible with SRIOV. I mean, if you use SRIOV, art stamping is not an option. So you can't do hardware time stamping with SRIOV, so you won't be able to do proper PTP. And PTP KVM, you're on the host, and it's a KVM hypercall, so it's very, there is no delay. And what is our tolerance, you asked, well, we tried to respect the norm. So we didn't talk about it a lot, but there is a norm that governs what we try to do on the substation. It's called IC61850. And IC61850 has a PTP profile, and the norm tells you what they expect for PTP configuration. Well, thank you, everybody, and have a nice day.