 Is it okay right now? Good morning, everyone. Good morning, everyone. Welcome to the second day of this year's DEFCON. I would like to welcome our today's first speaker who is a researcher at Masarik University who will take us to the long road to Autonomous Security. There will be a few minutes at the end of the session for your questions, so enjoy and without any further delay, I'm giving the floor to the speaker. Thank you. Hello. If you haven't been living under the rocks for the past few years, you probably noticed that the age of AI is upon us. It's almost everywhere. I mean, we have driverless cars, driverless taxis, trains are running themselves, oftentimes better than normal drivers. We also have tools that are able to produce art or at least pretty pictures and also these Lovecraftian horrors on the other side. We also have language models and chatbots that are able to pass the Turing test, that are able to converse with people that are able to create essays, to summarize text and all the other fancy stuff. And it's natural to ask what is the state of the AI in cybersecurity? I mean, where we are on this scale of two homicidal artificial mania. And this largely depends on who you are asking. If you ask the companies that are selling AI-powered systems for cybersecurity, yeah, they will tell you they're swiftly approaching the short-run level of capabilities. If you ask anybody else, mostly the researchers in this area, you'll probably get the answer that even the Roomba is overselling our current capabilities. And what's the reason for that? I mean, just a few numbers to begin with. The Tesla, that their new beta driverless has driven more than 250 million kilometers. And every time something happens, just look up the servers and it used to refine their models. Stable diffusion is producing those pretty pictures. Yeah, they used 160 million images to train their model. GPT-3 in the obsolete one right now has used 5 terabytes of text data to train and produce the model. What do we have in security? Well, mostly limited data sets. It was not so long ago time that for 10 years the old four training machine learning powered systems were all using the same old obsolete data sets. And the situation really hasn't changed that much because there's not enough data sets and they're not, let's say, variable enough. They do not cover all the use cases that we want to focus on. And they don't really reflect the real world situation. And also, we don't really have the environments where to train those autonomous cybersecurity agents. So the situation is in a way quite bad but as I assume most of you here are developers. So what is the natural reaction? Do something about it. Create some environment where you can train those agents, where you can work on that cybersecurity. But there's a good reason why the state of these tools and of, let's say, cybersecurity autonomy is where it is. Because it's not a really easy problem. So for start starters, if you want to create an environment where you want to train something, you first have to decide which paradigm to use. Where you just emulate everything. Where you'll be using, say, Docker containers where you use fully virtualized networks and so on or some hybrid approach. And going back to those large numbers a few slides ago just imagine that you won't do hundreds and thousands and millions of different scenarios that you want to play out. And you realize that you really can't work with virtualized environments. Because you don't have the hardware and power to run so many different scenarios to restart all these different scenarios, prepare the hardware. Also, you usually don't have all those vulnerable machines that you want to try and so on. So it's a problem to get all the stuff that you need. And even in emulation, you just hit the same brick wall. You just can't run it also. The only way to, say, prepare the environment for training autonomous cybersecurity tools is to have some kind of simulation environment. So does any one of you recognize some of these pictures? I see some heads nodding. The reason I'm telling you that these are open AI gym environments that are used for training machine learning algorithms. Problem is that you can't really use the cybersecurity. I mean this is not how you secure your network. And this is the problem that all those environments that are available are just simple abstract problems that just they don't reflect what the cybersecurity is about. Everything is connected, intertwined, domain is really hard and there's a really hard stuff to come up with the environment that is able to reflect that complexity. And let's say you get your hands on that type of environment. You have an environment where you can train it, where the environment reflects the complexity of the domain and you can simulate everything you need. Let's just for the sake of the argument say that you have this. Is it all that you need for creating autonomous agents or having working autonomous cybersecurity agents? And the answer is not really. This is just a high level description of all the problems that probably needs to be solved before you can even think about letting autonomous cybersecurity agents lose. Some of these stuff may not be applicable to let's say normal domain that this was done for the army, but most of it is valid even for civilian people in the context. But one step at a time. In this talk I would just guide you or present you the decisions that one has to make when deciding what to do, when you want to create a simulation environment and that you want to use for training autonomous cybersecurity agents that will be usable. So not something that's let's say just doing some abstract stuff, but something that can later be deployed and used in the real world context. So it starts with let's say one easier decision is it's to choose what modeling approach we use, whether it be some kind of discrete event simulation, market processes. This is actually one of the few questions that is not really that important because you can probably get away with anything. The harder part is choosing the abstraction that you want to use. So for example I'll start with a network model. Let's say that you want to create an autonomous agent that will be able to somehow guard or attack, it doesn't really matter, some kind of network. Then you probably have to in that environment have some representation of that network. So yeah, it's just network, whatever it is, a collection of nodes that are connected, some connections. Is this enough to decide what to do? You probably need to do some refinements. You have to say okay, so there are two types of nodes. We have something on the edge and then we have network and active network devices. But on the edge there are different types of machines like workstations, you have servers, you somehow need to account for that in the model. How about the IoT stuff? You have the cameras, you have some kind of printers, printers are good for attacks, but how about the devices, laptops, mobile phones, what not. All the stuff in the cloud, will you account for that in the model? You have to, because it's just what it is right now. So when you want to model how the infrastructure looks, and that's at the network level, you have to account for the current realities. And it's a bit complicated. And it needs to be detailed enough for that organizational to decide. And to decide correctly. But let's say you decide on some kind of infrastructure. But for you, now it's another part. How do you model those nodes that are in the infrastructure? We'll type of the node, like PC, phone, printer, whatever, will it be enough? Or will you have to include more? What is the operating system running on it? What other services running on it? And if you think about how different exploit looks like, and how the tools that are using them to exploit this stuff, how they work, you probably need to give it something more to work with. So for some exploits, yeah, you will need to have at least some kind of model of the cloud system. You may need some model of memory. And for even some lower level of exploits, you may need modeling the communication between bosses. It's just, it's your decision, and you always have to think about what the agent will do, what it will do, what it will be capable of doing. How the connections between those nodes. I mean, you can decide on a medium, whether you will be simulating that there's a difference between, for example, going over the air, over the wire, will you be modeling the properties, like the bandwidth of the connections? That's needed. If you want your agent to, for example, be able to work with DOS attacks. Because then you need to model how the connection looks like. And how about protocols? There are many different effects on protocols that are using their structure or whatnot, so you probably need to include that. Or not. It really depends what you expect your agent to do. And how about users? It's another kind of work. I mean, do you even want to have users in your simulations? Or just say, okay, let's say there are no users, and we're just securing infrastructure of attacking them. But if you decide that you won't have users, will there be just some kind of user type, so will they have different identities that are, for example, linked to different data on those nodes? Are the users active? Are they doing something, or are they just reacting to let's say external stimuli? Do they produce some traffic? Even are they just working 8 hours a day? Do you just model their behavior as the day goes on? There are hundreds of considerations that you have to do. And each of these considerations will affect what the agent will be able to do. And we're talking both attackers and defenders. Because if you don't model something, then the agent is not able to react only or act only. So we have hundreds of considerations. And what I've described right now is still just only the passive side. You're describing what is the environment where it happens. But when you're trying to create a simulation that's workable, you also need to think about the active side. You need to think about what the agents can do. So for example, I'm talking now about attackers. How do you model the attacker, the attacks unfold? I mean, for quite a long time, this was the default attack model that was considered in these publications. There's an attacker and this target. That's all. Something happens. Then there was, let's say, US attacks or some attacks that require coordination. More attackers, one target. Still, nothing complicated. But then you have things like this. That's just a representation of what Stuxnet did. You see that the attack path and usually attack path of any APT or some more complex malware is much more complicated. So when you're considering how to model the attack, you just need it to enable it to do all this stuff. If you want it to be able to act upon it and act reasonably. Which that's another layer of complexity. When you give the actions to the agent, what they can do, they need to be expressible within that model. They need to have the impact and the agent must be able to learn that impact. Again, there's something you need to incorporate in that model because then, when you're implementing it, you need to be able to provide the correct outputs for agent input. So what will you do? Just some kind of abstract actions. Which is, for example, this is something that is currently most being done. There are only few abstract actions like say, okay, I'm scanning or I'm good for saying I'm exfiltrating keys. And that's all. But if you ever use those tools that are doing those attacks, the actions are usually much more complex. You have to set a lot of parameters for those actions to work. You can also use some kind of attack and defense taxonomies that are available. Or you can use something of your own. This is just for illustration. These are attack and defense frameworks that are done by Mitre that provide some kind of structure to possible attack and defense actions. They're just high level. There are also many different tactics that are linked to each of those categories. But if you want to simulate that and you decide you want to simulate that, then you just need to go through each of these actions possibly each of those techniques and you need to implement how those techniques reflect in your model. So that's quite a lot of stuff that probably needs to be done and you get this for free when you're having fully virtualized environment. But as I said earlier, you really can't do that because you don't have the hardware. So yeah, you're kind of stuck between the rock and the hard place with this. And I would just briefly introduce one of the simulation environments. One that I developed or started development of currently a work of many people. And I never wanted to do this. I just wanted to produce the agents. But instead I spent a couple of years doing this and it's been only recently that I got into the state that I can use to train something usable. So that's the simulation environment. It's just one combination of all the possible parameters, all the possible decisions that I've described earlier. And it's a discrete event simulator that supports message passing. It provides it's mostly used for simulation of lateral movement. Which means that there is the infrastructure somewhat simulated. There are services simulated as running on those nodes. And there's not that many complexity. There's a device that provides through the comprehensive authentication authorization framework. It's working with vulnerabilities that are on CVE sets, tied to Metasploit and so on. So it's being done in a way that the agents trained on that simulation can be then moved to the real world. And yeah, it integrates with those different machine learning toolkits. It enables to use different behavioral models. So you can plug different behaviors or different actions that the agents can do and then you can compare how this works. And it even simulates or enables integration with stuff running outside the simulation. So you don't need to re-implement everything in the simulation. So you can have your ideas or IPS running outside of it. And there's just a bridge between those two that you can just take the messages that are running in the simulation, convert it to something that the IPS understands. Just let it process it and give you back the response. So it's just trying to reduce the effort that the user of the system needs to do. Why I'm saying it is because it's open source. You can try it, you can download it, you can play with it. The easiest way to find it is just to look at the IP. And yeah, there's also a documentation. It's a research project. So the documentation is incomplete as it usually is. But you can start doing something with that. We have a lot to learn. We have big plans with this and the development still continues. A few things just to name that we're working on is one that's tied to the unavailability of datasets. And that is we're building a mechanism that can create a realistic simulation, realistic scenarios. So that you just provide it with some constraints what you want to have in this scenario. For example, I want to have this large infrastructure with some of those services or nodes running and it will construct a vulnerable infrastructure that has some attack paths that runs through it and then is able to instantiate it. The other important thing is that we are working on transition from the simulation to emulation. This is one big problem in this area that when you have the agent that trains on the simulation you don't need to somehow move it outside. This transition is complicated because in a simulation you're working with abstract actions. You try to move it to real world and somehow this doesn't really transfer. In the real world you have to use different tools. You have to give the agent the ability to affect the environment. So that's something that we're working on and we have some working prototypes that the agent can just directly use what it's been training with or training on and it just moved to real world. It still thinks that it's inside a simulation environment nothing changes for it, but it's doing a real world work on the outside. This may be too long for you and I understand it because it's really at the beginnings. You can, for example, want it to be available as a service that it can already create those scenarios that enables you to play with those agents that have visual and other things or whatever. Maybe just only the big green button that does all the stuff. We're working on that too. It's just a sneak peak preview of what we're working on and it's the AI do-it-your-project that we're doing as a collaboration between Maastricht University and Celerd. This is something that will be this should be an integrated environment for development of those type of technologies with already pre-made agents that you can use. For example, if you want to train users with those agents that they can collaborate with or they can fight against each other and provide you with all the analytics and all the control. That's going to take some time. This is the current state. Either wait a few years then we'll have something and maybe my presentation will be different in the time or you can just try to code something and see where it gets you. That's all for me for today. Thank you. So first, do you implement a single one? Do you see the advantage of implementing multiple ones to get better results on the analytical side of the model? We actually implemented multiple models. What we learned was that talking about some kind of research model there was some kind of taxonomy of attack actions and it turned out that on the paper it sounded good and when we tried to implement it we just find a lot of edge cases where it was not working, it was not complete, it was overlapping so we're still trying to find the good model that's comprehensive enough but it's implementable in the simulation and it translates well to the real world. So Microsoft has a security from what's called the help thing. How does this relate to that? There are people trying to use those language models to drive those agents and it kind of sort of works and usually those language models, they are noisy. The previous thing was that sometimes it works, sometimes it does not so if you use it for example to drive a penetration testing tool it works in a way because there's no problem with it because even if you do wrong actions nothing bad really happens but in this situation you don't want to use it as drivers of the defense because you can't really work with let's say 80% success rate. So yeah, people are using it, it produces some results but I think going the last last mile to have it, to use it as something that you can trust, that's going to take much longer time. What is the possible approach? That's a good action, I have this question. I'm very sure this is where it really depends on the agents that prepare it somehow, it's beyond my domain. I was actually wondering about that myself and at the risk of exposing a lot of ignorance here, what I was worried about was after you have this simulation of things trained, when you're trying to protect something in the real world does it have to know what it's protecting? In other words understanding the configurations of all the machines and the open ports and just everything else and it seems to me the simplest way to gather that information is to have a privileged crawler or something like that that can draw information from the different machines and find out about their configurations but if it's doing that, it's on your network and if that thing gets compromised... Yeah, that's a hard problem and usually what's happening in a simulation world is that those agents are provided that global overview of states, something that you describe, that's what they are being trained on. I mean I'm not aware of somebody doing defended that would take care of the whole infrastructure and doesn't have access to this. First time it gets compromised it's going to get ugly but then on the other hand I'm working mostly on attackers so I'm okay with that. Your team or the whole project that's working on this? The AI dojo is probably rich and people, something like that. I was thinking if you have reached out to some... I saw the disclaimer that you are funded by some European project but have you tried securing some funding or support from some of the big companies that are certainly interested in this type of work? Actually not a big company. This research started originally as a research working group by NATO and we got some support for developing prototypes of those type of agents but no, we're actually not contacting large companies because fortunately we still have funding from public sources and usually we want to have everything as open as possible and we're not sure how we will turn out companies. Are you expecting to have some AI say defender or attacker or some actor who will change setups in our networks in the end to keep it secure? Well, that's something I would like. I would like to have those attackers and defenders fighting against each other and bettering themselves. One of the reasons is that what I see in literature often is that the attackers are usually pretty stupid. They're not doing much of the complicated work. There's a perception that the attackers are like this and the defenses are being tailored against those type of attackers so I would like to have even better attackers so that the defenders can get really better and to move it to a higher level. Is that integration with cyber range? We also have one. This is another step. Right now we have the simulation. We have a way to transfer it to emulation. We have some kind of configuration that describes how the infrastructure with all the agents looks in a simulated environment. We have a way to transform it to a decorized environment but all those configurations are pretty easily transferable to the best agents or those agents that need to for example interact with humans will be moving it also to the virtualized environment. All the questions we've had. Before we all disperse I've been asked to inform everybody that number one, there is going to be a social event today so you can pick up your tickets and check in during the lunch period. Second, there is going to be a way to get feedback on sketch.org so if you want you can say how you like or you don't like the event. Third, after we have a social event around 2130 I think there is going to be some fireworks and drone show around the den so if you want to watch you can join it, it's public, it's not funded or in any way and there is quite this organization but it's a nice event anyway and finally there is going to be a way to win session tomorrow where you can win some swag or answer it, you can see that. That was just on the screen so you can win some swag or answer your questions. Welcome everyone to the second to the second session of the second day of DEF CON. I would like to welcome the speaker Akan Shaluga sorry I forgot your senior data scientist who will talk about privacy in the open source there will be a few minutes at the end of the session please answer for your questions and without any further delay I am giving the floor to the speaker. Thank you Good morning everyone, thank you for coming for my talk so I believe everyone who is sitting in this room has a mobile phone, a laptop and you'll know like when you have access to internet lots and lots of data is being collected from your machines sometimes with your consent and sometimes without your consent and it's really hard to track who's taking your information and what are they doing with it but would it help to know that there's a way that all these websites will collect your data will give you amazing recommendations and inferences out of your data but at the same time will ensure that your data remains private and nobody gets to know your personal information and that's why I am here introducing homomorphic encryption that ensures privacy all around us and the main focus here is going to be how we try to ensure privacy in the open source world using homomorphic encryption. So before we deep dive into this I would like to take a moment and introduce myself. My name is Akan Shaluga I am a senior data scientist from the emerging technology's data science team at the office of the CTO I originally come from Boston, United States of America and I have my Github, LinkedIn, Twitter all linked here if you have questions or concerns about this topic happy to chat about that. So let's move towards homomorphic encryption I would assume that most of us know what encryption is but still would like to take a moment to explain what that is. It's basically a way of scrambling the data so that only certain authorized parties would have access to the information of what that data actually means and what is homomorphic encryption it's basically the process where we can perform computation on this encrypted data. So as most companies continue to develop machine learning models sometimes these machine learning models could be a key asset to the company and therefore cannot be directly shared with a client who wants to use this model on their data and at the same time this data is also confidential to the client which they don't want to share with the company who's providing their machine learning and AI services to them and that's where homomorphic encryption comes in picture. It lets you apply the machine learning from the company and client's data on that model without neither of them getting to know about the details of the private information that they have here. Talking about applications of homomorphic encryption there are tons and tons of them starting from healthcare to smart electric grids from education to machine learning as a service name it and homomorphic encryption can be applied to literally any industry where input privacy is the paramount concern. In my opinion the most important use case of homomorphic encryption would lie in the healthcare industry where precision medicine would involve a lot of privacy related rules and regulations and a lot of the companies who are in the pharmaceutical industry and creating medicines or they need important data to predict what sort of medicines we want to have for certain kind of diseases it's really really important to also ensure the privacy and all the data that was concerned with the patient so that's where homomorphic encryption is super important because if you breach any of the rules and regulations associated to the privacy of the patients it comes at a huge cost so homomorphic encryption lets you just bypass that extra cost that you pay usually if you breach these rules and regulations and it helps you still make good predictions using all the past data that we've collected over the years and talking about how homomorphic encryption can be really vital in the open source world as we all know the benefits of homomorphic encryption is to ensure that there is a private a secure collaborative environment and that's the common part with the open source world where transparency is the most important thing we would like to ensure when it comes to open source communities and open source projects and also at the same time we would like to ensure that there is some sort of privacy and a secure environment for our contributors as well starting off with predicting sensitive information so open source communities often times deal with a lot of sensitive information for example there is user data there is passwords some financial transactions and we would like to ensure that this data is not open for everybody even though it's a part of the open source projects and homomorphic encryption ensures that only authorized parties get to see the private information however we still try to ensure that this data is put in the right places encrypted and only used for the right purposes also helps in secure collaboration so for example there are sometimes multiple companies who are collaborating on an open source project even though they want to put together their technical abilities technical knowledge to this project however they do not want their proprietary information to be shared with this competitor company they are working with on this project so homomorphic encryption ensures that there is a secure collaboration between two different companies and developers while they contribute to the open source projects they are using the intellectual property as we said that sometimes the models and the data that these companies bring together are an asset to these companies how much ever they would like to contribute to the open source world sometimes few things cannot be made public for example if you were to do an automated driving system Tesla is like the only leader in the market at this point but if other companies were to join hands with their machine learning model put together data sets but they cannot seem to share the data sets in the whole wide world so what they can do is just put together their data and crypt it whereas all of them use it for the same model ensuring that all this data that they have collected was private and still making good predictions for the automated driving systems finally ensuring data privacy it is also very important to ensure that any sort of personal data is protected from getting lead also a lot of open source projects have secure voting so oftentimes we have to vote for some decisions that we would like to take in open source communities if you want to go further with this project do we want to get extra brownie points for this particular project and sometimes people don't have the entire privacy to cast a vote freely so homomorphic encryption is something that could be brought into picture and ensure that people have the right to vote and also just the privacy to vote for whichever project they want to vote for so having said that homomorphic encryption does have a lot of advantages starting from performing inferences on encrypted data just like it would perform inferences on the plain data there is no interaction that is involved between the data set holder or the model holder and it also helps to just like do the outsourcing for data storage but everything comes with some sort of disadvantages since this is a computation that is performed on encrypted data it is computationally very very expensive the normal computers are usually not designed for homomorphic encryption kind of workloads so it comes at a huge cost and even to do the smallest of the calculations it takes a lot of resources to just perform addition or subtraction besides that it also has some limitations in terms of the calculations that you can perform on the data so the major thing when we do as a data scientist in any of our projects is data filtering, data cleaning, data comparison but homomorphic encryption specifically lacks this use case where you cannot compare two values it basically cannot let you know which value is lesser than the other value and that also makes data filtering or data division an impossible task so anything that involves operations besides division or comparison can be performed using homomorphic encryption but this is like the only limitation that we have so far so talking about different types of homomorphic encryption back in the 80s and 90s when we first started not me but when the world started to research on homomorphic encryption the first scheme and the first type of homomorphic encryption was developed it's called the partial homomorphic encryption it was started with the palier crypto system it allows only basic operations where you can perform addition and multiplication this also has couple limitations which I'll go over in the demo as well but you can perform two encrypted numbers addition but you still cannot do multiplication between two encrypted numbers you can only do one encrypted number and one plain number so this does has kind of limitations but if you talk about the era this was developed in I think they were doing a pretty good job at that time then comes the somewhat which came up with some more advanced abilities it supports only two operations but it has more depth you can like get into the technicalities you can perform algebraic equations and BFP scheme is one of the things that comes under the somewhat HE I would not go into too much detail about these schemes because I think that's a good topic for another talk but just to give you an overview HE is the major BFE scheme comes as a part of the somewhat HE category and then comes the fully HE so as of now most of the researchers and companies are using fully homomorphic encryption techniques to perform homomorphic encryption it can allow you to perform any number of complex operations any depth, exponentials, matrix multiplication, name it and you can perform it using fully homomorphic encryption at this point even machine learning models like linear regression, logistic regression CNN models, image detection all of it is possible only and only because of the fully homomorphic encryption scheme it uses the approximation of real numbers to make predictions and make calculations on encrypted data so one small difference between how somewhat HE and fully HE are different from each other so for example we had two numbers 2 and 3 which add up to 5 but somewhat HE would approach this problem as encryption of 2 plus encryption of 3 is equals to 5 whereas fully HE makes you give an approximate estimate of what this number could be it does not directly add 2 and 3 it would be somewhere around say 1.99 to 3.99 or it would say 1.99 plus 2.99 something which is like plus minus error of 10% so it would not be an exact calculation but if you see in long term and complex calculations this is actually much more accurate than just assuming them to be real integers and that would cause some sort of errors if you're doing just like whole numbers at some point so when it comes to complex mathematical operations it makes much more sense to have the approximate feature where you can just go to decimal points and perform the calculations so we also performed a comparison study on various open source tools that are available and we performed some research on that so talking about like the kind of work I do at Red Hat the team is mainly focused on doing research based projects in mainly Python where we come up with machine learning and AI solutions for any sort of problems within and outside Red Hat so as a part of that most of our code is written in Python so we thought if we have to integrate this feature in any of our projects it would make much more sense to have this tool written in Python language whereas when you just first google more morphic encryption you will see that Microsoft seal which is written completely in C++ it is not something that we can directly use for any of our projects so the first challenge that we want to tackle here was we need libraries open source libraries that are specifically written in Python and that can be seamlessly integrated with the code base that we already have so starting with the palier crypto system which is the first technique that was developed as a part of the palier crypto system it's developed from 1999 it's a partial homomorphic encryption scheme as I said it does have limitations of performing operations we can only do addition and multiplication of one plain text and encrypted number so this is kind of a shortcoming for this encryption system that it cannot perform complex operations but for anybody who's just starting to get into crypto system and encrypting numbers and getting into the encryption area I think palier's crypto system is the best way to understand the basics of this concept and to get an understanding of how and why we are doing all of this then comes the pie seal which is also a wrapper function for the microsoft seal but it has a lot of limitations in terms of trying to import this to our python code it has to be built with certain number of config files and there are a lot of dependencies that often times are hard to manage when it comes to open source projects where we would want seamless integration with new people, new projects it's kind of difficult to manage something like this so we thought that we'll research more about this and then we found a pie FHE it was developed couple years ago in MIT as a part of a phd project it's a fully homomorphic encryption library written in python and includes the bfe and the ckkf scheme it has almost all the operations that you would like to perform on your data but like as soon as these people graduated out of school they start maintaining this project there's lack of documentation how much ever we would like to use this it definitely lacks a lot of documentation to be used on a long term basis and then we found out pie FHEL which is also an open source homomorphic encryption library has tons and tons of operations that are available it has a very similar syntax to normal arithmetic it's super easy to use and it uses c and c++ in the back end it's perfect for any sort of homomorphic encryption operations but if you all know more about data science most of our data is in the form of vectors and when it comes to complex operations we would also like to use vector arrays and things that you can use to perform on tensors and finally like we ended almost our research on tensile it is an open source library developed by open minded it is built on the top of microsoft seal super easy to use it's literally a pip install tensile and you can just start using it right away in your scripts in your jupiter notebooks super seamless to use and with the ability to perform these operations on tensors we can use pie torch models we can perform any sort of machine learning operations using tensile so we've also done a proof of concept using tensile library it's super easy to perform logistic regression and make predictions on any data that you have so this is the by far the most awesome library that we found open minded is also currently working on another open source python library which is very similar to pie seal so basically they're just reviving that old library which was dead couple years ago and they're trying to revive it and also make it a pip install that you can use for any sort of data that you have but tensile is more focused on tensors and vectors and complex machine learning algorithms so I think I'm going to move to the demo and give you an overview of what we've done so far and how did we get to this research and like it looks super easy like we do like a comparison study but like it's months and months of effort where we get to understand what we've done so far and I would like you to also go to the repository and check it out if this is something that interests you you can just go to the repository where we have put together all the documentation and all the pain points that we went through there's issues there's documentation that you could go through so starting off with all the notebooks that we've put together so as I said we started with python palier this is the most basic way of getting into the crypto system industry you can just like go through the notebook and try to understand how and why we are doing this so starting off with we import this library from palier and just assign to random numbers and then we try to encrypt them and perform addition on that so once we add them we get the result which is exactly the number that we were expecting and we also try to do the same thing with a scalar and an encrypted number again we got a correct result but the most important thing to notice here is that how long does it take to perform this computation I agree this is an accurate measurement of what this addition would be but it takes a lot longer than expected so if you look at it the homomorphic addition took 92.8 microseconds whereas the vector addition which is the normal addition it took 102 nanoseconds so if you were to see the comparison of this the vector addition is more than 1000 times faster than homomorphic addition so this comes at huge cost of time and resources but if you talk about doing a multiplication between two numbers using the palier library what you see is they give you an error which says good luck with that because they don't have the ability to multiply two numbers so this was the first first set of things that they started to work on in terms of homomorphic encryption and slowly and gradually it improves and that's what I'm going to show so I'll go over to 10C which is like the final library and make a comparative study of how this is better and faster from the palier crypto system so 10C as I said super easy to import just import 10C and initialize a context with couple values and parameters that you would like to specify 10C also has great documentation on how you should parameterize your encryption keys etc then we take couple vectors try to perform addition so we first perform ciphertext to plaintext which is one scalar and one encrypted value and we also do the same for subtraction multiplication etc but let's move to the ciphertext to ciphertext calculation and try to track the timing and memory that it takes to do so so if you look at the addition homomorphic addition took 24.9 microseconds whereas the addition took 1.71 microseconds the funny thing to notice here is how far we've come the last notebook we saw it was a nanosecond and a microsecond and now they both at least have the same metric at this point so earlier it was more than 1000 times faster at this point it's just 15 times faster which is still something that is doable with the amount of resources that are available right now in terms of performing any sort of computations I think this is still a reasonable amount of resources that homomorphic encryption requires at this point then if you look at subtraction that is also around 15 times faster and then multiplication however takes much much longer so vector multiplication is 3000 times faster than homomorphic multiplication but if you all have done any sort of computations on data and machine learning you would know that it's no piece of cake it does not require just addition it's much more complex than you would imagine there's derivatives etc so it takes much much more resources when it comes to the real problems and that's why we performed a proof of concept to see how long and how expensive this is going to be to actually perform homomorphic encryption in the real world so we did a logistic regression proof of concept we took a data set from kegel this is a hard disease data set and we wanted to predict the overall risk using all the data that was available so we have this data available here like we just inspect what the data has we just try to clean it up a little remove the NA values and then drop a couple columns that were looking irrelevant and just quickly put together a logistic regression model without any encryption on this data it's just like five lines of code you mention a classifier and make a prediction and finally calculate accuracy and this is the final report that we get after doing basic logistic regression and mind you this takes literally a second so this is jupiter notebooks for who don't know what this is so jupiter notebooks is an interactive form of using python code you can just run each cell at your own expense you can run these cells in any order and you can get an output right away so the moment you click enter on these cells it just takes like couple seconds to show the result so that's how fast creating models on small data sets or just non encrypted data sets is so now we move forward we create a torch model just try to initialize the logistic regression model here which gave us a good accuracy on this data set and then finally we thought it would be interesting to do an evaluation on a model and the data that is encrypted and see how long does it actually take so when we perform encrypted evaluation on this data set let me just quickly go to the final part so we tried to do couple epochs when I was doing it on my system initially I just like tried five epochs because that's something that I just choose as a number to start my calculation with and my jupiter hub it just hung up on me because this was eating up so many resources then I brought it down to some less number of epochs so I went with three and I thought that was a sweet spot where it wasn't breaking for me average time for each epoch to train here took 350 seconds whereas the normal logistic regression would take barely a second to just run and here it took 350 into 4 that's super long for doing any sort of basic calculation on a very small data set so this data set has only 4000 rows and still it takes this long to make a prediction on this data set but if we talk about the accuracy it was much better than the normal one that we got on the basic logistic regression model might be a flake but I would like to believe that it took it's sweet time use the encrypted data but still came up to the level of the plain logistic regression model which is awesome so one thing we are sure of here that if anything this is an accurate model that makes accurate predictions just comes with some shortcomings that it's sometimes expensive for you but we are also working with couple teams within an outside red hat to ensure that we can have some sort of accelerators and distribute our workloads so as to ensure that it is not as computationally expensive as it's getting so hopefully maybe next time we would have like a much more optimized proof of concept where we are doing all of the same things but in a much faster and less expensive fashion so that's about my demo I will go back to the slides I don't think we have the time but I think do we have time? Yeah I can quickly go over the most frequently asked questions so a lot of people ask how is homomorphic encryption related to federated learning? These two go hand in hand and a lot of people often times confuse one for the other even though both of them ensure security, privacy ensuring distributed workloads, collaborative environment they are specifically different from each other so homomorphic encryption is basically I have the model and you send your encrypted data to me whereas federated learning is just training machine learning models or decentralized devices basically means that there is a model somewhere and I encrypt the model and send it to the people who have their data that they cannot share so for example if there is a pharmaceutical company who has like sensitive patients data and they would want to use my model and they'll tell me that oh I have huge data sets I can't possibly encrypt that and send it to you so what I do as a data scientist is that I encrypt my model and send it to them and they can just use it at their own expense and try to make predictions on their data but coming back to the initial statement that I made when I started my talk was that lots and lots of websites are collecting data and I don't want them to see my data I don't want them to just like openly use my data to make predictions homomorphic encryptions ensures the privacy of the data while making predictions but federated learning is something that that company would still have access to your data the model which is a proprietary information to a different company that is the encrypted part here so I still think when it comes to the privacy of ensuring contributors data or customers data homomorphic encryption is the way to go but federated learning is no bad I think it still ensures that your data lies within the company and doesn't go out in the open world so that's the comparative study between homomorphic encryption and federated learning and another more frequently asked question is how is it related to confidential computing so confidential computing involves a lot of hardware and you need to ensure that you put your data in a private spot so for example Amazon, Microsoft all of these big players are putting up storage centers where they ensure the privacy of your data by putting them in a remote location which is also again very expensive not only does it require storage space it actually requires a physical space to store all of this data somewhere in a private environment so that's like the main differences between homomorphic encryption and confidential computing there are couple blogs out there that you can check out to understand the differences between the two and this is just like my study on how these two are different from one and the other so that's it from my side this is a GitHub repository and my email with my colleague who's worked on this project please feel free to ask any questions or concerns feel free to raise issues on the repository if you're interested to contribute and I would be happy to help you out with that there's a question yeah so all the tools that I mentioned in the presentation were based in python thank you for your question yeah I'm going to repeat the question so he asked that since we put all our data in the database it's oftentimes you know like it involves a lot of operations to encrypt the data shouldn't we do it before putting it somewhere so nobody has access to the exact information is that right yeah alright that's a very good question so as a part of this project we have just done a proof of concept and the whole point why we use jupiter notebooks is to just see the results the moment we perform any sort of calculation but if we were to talk about the real world where real data exists we would try to create a pipeline of this data so the moment the data is being collected from our customers it should be in the pipeline where a script runs on this data and ensures that before it reaches the database it should be encrypted and once it's in your database it's already going to be encrypted before that step and from this database you can choose any sort of calculation or algorithm that you want to perform on this data ensuring that the data was still private so I think this is all customizable I just threw everything together in one notebook it's super easy to put them in different scripts and in different order of the pipeline that you would like to perform this operation on yes yeah ckks yeah so fully homomorphic encryption has ckks and bgv schemes both yeah so most of the libraries that support ckks also support the bgv scheme yeah go for it yeah so I might assume that the same performance would be when you would try anything else I mean that's a reasonable assumption right then essentially if I were to try to confidentially try something like actually industry grade you know let's say standard image recognition you know let's say exception retrain it on something very simple don't put it on any reasonable hardware if I'm right take several years probably because like since this is like two orders of pipeline to hit I mean if ever ever this became industry standard then it would essentially push everybody out of the machine learning product except for like two companies because nobody else could afford it right doing this or having this things that's a great question and this is the fear that a lot of companies like even us thought would be a really big con for us in terms of homomorphic encryption it's super hard to do extensive calculations but trust me there's a lot of research going out in the entire world where people are working towards this personally my team is currently working with the Boston University where we are trying to come up with some sort of FPGA that helps us accelerate distribute our workloads along with some not so very expensive hardware that would still allow us to do the same sort of operations but at a lower cost and a much faster process besides that there's also different ways where we could write our code in a way probably like just distribute our code in a way that it's easier to perform calculations so that it doesn't eat up a lot of resources because the biggest resource it's eating up right now is the memory where it creates the keys then encrypts the data stores it somewhere that just eats up a lot of memory but if we have a very distributed way of performing all of this meticulously I think with time the day is not far where we would be able to do all of this seamlessly and about the image detection there's a lot of research going on MNIST data set if you're aware of that where there are numbers 0 to 9 and there are pictures of these numbers where the machine learning model tries to recognize the number just by looking at the picture of it so there's CNN model which a lot of people have been exploring on K-Grill using the encryption as well there's a lot of open research going on that and it's somewhat accurate at this point just that like since this is an image it takes longer you know even in normal machine learning the image detection is much harder than numbers yeah but that and this data set that's very old that's like the standard the first thing you do is if you had any machine learning and it's like 5 seconds to do something with it but you don't input it it's like 5 minutes that's incredible disadvantage for anybody exactly but we do have to give the benefit of doubt to homomorphic encryption because like in 1999 was the first time where there was an actual homomorphic encryption system that was developed and we've come so far and since we're starting with logistic regression and CNN models I think it's only the beginning of this era for homomorphic encryption and in no time probably in couple years this is the chart GPT era we never know when we just reached that point where we are able to do all of this seamlessly soon thank you all right hey everyone thanks for joining us my name is Oindrilla Chatterjee and I am a senior data scientist at Red Hat and I work in the emerging technologies group at Red Hat and I'm based out of California and the United States and if you want to join us I'm going to be talking to you and I'm going to be talking to you and I'm coming from Boston United States and today we're going to be talking about uncovering new open source communities using graphical analysis and network analysis and I'll let my colleague introduce herself thanks Oindrilla so hello everyone I'm from California and the United States and if you want to connect with me later you can reach me on LinkedIn GitHub feel free to reach out to us after the talk as well so before we get started I would encourage you all to answer this question that we have by scanning the QR code so this is just a live poll to get an understanding to know what is your role right now in open source so are you a developer are you a project owner are you a project maintainer whatever it may be feel free to put in your answers over there and we should also start seeing it populate as well so I see there's a software engineer data scientist so more software engineers but yeah so as we can see we all have different roles to play in this open source community our majority I'm guessing here at this conference are more from a technical perspective so we are more of developers and engineers and things like that right so all of us have this important sort of role that we play and part of the motivation of this project or the kind of goal that we want to do in this project is firstly we want to identify and get notified about the early and emerging open source projects that are existing out there so we all know that the community is so large and so big that it's sometimes unclear for us to sort of pinpoint and identify those new and emerging projects so that's one thing that we would like to achieve we would also like to look at the communities around these projects so we want to track who are the important user groups who are the important core contributors who are your open source ecosystem main sort of maintainers and how do these kind of evolve over time and finally we would also like to graphically visualize all of this so that's where the network analysis and social graphical analysis that we're going to talk about comes into picture so what we want to do here is we want to sort of identify the maturity of a project over time and we want to see where the interrelationships between projects exist so how does network analysis help in achieving these goals right so we talked about looking at early notifications trying to track those important projects so one way to do this is representing them graphically so what we do here in the first graph that you see is you're trying to depict your projects or your GitHub repositories as your central nodes and you're also trying to identify the contributors around these different projects so you're trying to see how many contributors exist in a particular project and so on and we want to further dive down into that graph a little bit more and look at what are the important nodes that exist in that graph so that's where all those graphical algorithms come into picture so you're trying to identify which are the top most important nodes in that graph and try to identify the influential users and so on and then finally we also want to look at it from a time perspective so you want to see the growth of a project let's say you're starting a new project you want to sort of project what the trajectory might look like over time so you also want to track all of this in a historical way and ultimately we also want to incorporate some AI and machine learning capability to sort of predict what the trajectory would look like for this project to succeed so that's kind of how network analysis can sort of tie all of this together and all of this kind of effort is part of a larger initiative called the project ASPIN is an open source project which is mainly developed at the open source program office at Red Hat so what ASPIN does is it has a couple of components that are currently actively being developed which we are also using in our project in our work so firstly there is a tool called AUGAR where we collect all of our open source project data from so AUGAR essentially is scraping all of the data which is from GitHub and we mainly use that tool to collect all the information from various GitHub repositories and we also have a visualization tool called 8.0 so this is an interactive dashboard so what you see on the graph over there on the right is developed by these dashboards so that's a tool that we also have where you can sort of filter which repository you're interested in and you can sort of see some metrics like community health metrics, project activity over time and things like that so there are some inbuilt metrics in the dashboard that you can further look at and then we also have the repel repo which is the main repo where we do all of our research and open-ended sort of experiments to look at which is what we are contributing to so that's a little bit about the larger initiative and now we can come to the representation of these projects that we talked about in a more graphical format so if you look at the first one on the left over here we see that the central nodes are basically your project so it represents a particular repository and you can also find out who are your contributors which are surrounding that particular repo so here we do not pinpoint contributors we're completely sort of obfuscating that part of it so we're making it very private and it's more like just looking at the contributor IDs and that's how it looks like so some of these projects have a larger sort of dense contributors some of the smaller ones like the purple one have a lesser kind of population around it so this is one way to represent a particular project now if you look at the second representation again here each of the nodes represent your repositories but what we're doing in the edges is looking at the number of shared common contributions that exist between those repositories so that's how it's likely different from the first representation so the edge between those two nodes is basically like the weight of the contributions that might have exist between those projects so those are defined by the activities that happen in the project which we will come to a little later in the slides but this is kind of how we would represent it from a graphical standpoint so yes you have a question as the weight of the edge and also the length so if you see a larger distance it's probably because they have a lesser number of contributions whereas if you see them being a little more closer that means the weight of the contributions is a little higher so yeah that's how the representation is so next we can move on to representing projects as nodes and the shared contributions that we were looking at as edges so here the main goal to do this kind of representation is you want to sort of aggregate all of those shared activities in those nodes and then you want to sort of also further drill down and filter out those which might not have a lot of activity so that you only focus on those key sort of projects in those nodes right so for example here we're trying to look at some of the more popular Kubernetes based repositories so we have a couple of Kubernetes repos we also have some of the OpenShift repositories so OpenShift is again very much related to Kubernetes so these are some projects where we picked because of their sort of known connections to each other and we try to look at how close they are how far apart they are so that the goal is that when you visualize it like this you can actually identify those which are emerging right so if you know one very well-known project let's say Kubernetes you want to know what are the other projects surrounding that very well-known project because those can potentially be your next emerging project so you kind of want to identify those key links between each other so that's what we're trying to do in this particular representation so what exactly counts as those shared activities right so we saw those edge weights between those different nodes so what we count as an activity are things like issues, PRs, commits PR reviews and things like that and the weight that we defined is defined by you know what is the strength of the connection based on those type of contributions being made so basically you're looking at are these contributions done by a maintainer are these contributions done by a developer or a core contributor and so on so that's how those edge weights are being defined and then finally you also want to find those emerging projects so we look at some metrics for a particular project like how many number of forks does this project have how many number of stars, excuse me how many number of stars does this project have and then of course the activity trend over time for a particular project so that's kind of how we look at all these different shared activities so again if you can take the poll and answer this question you don't have to scan it again it should already be there in the previous tab that you had open but I want to ask you all what do you think makes a project rapidly emerging so you have a couple of options to choose from for example do you look at the growth in the number of stars that it has do you look at the number of issues do you look at the external popularity of that project so kind of rating what you think is most important over here so I see some votes coming in some people are looking at the issues PR comments as the most important some people are looking less in terms of the number of forks number of stars so I guess most of us are more interested in the activity of a project I'm going to give a few more seconds for folks to add their responses okay so we have some people who are interested in external popularity so that's nice you want to sort of also see apart from PRs and commits and all of that you also want to look at it from external factors right so maybe you've heard about this project in some other outlets so that's also an important feature awesome so yeah thanks for participating these are all good responses and you can sort of see that each of you based on your role also as kind of weighing things differently right so that's what we can also take into account when we do this kind of analysis to make sure that we're capturing the right set of activities that you think is important and that's how you can sort of gauge where your project lives so following up one more question that we have for you is what is the most important insight that you're looking for from your own community so most of us participate in some kind of open source community so what is it that you want to learn about these communities right so whether you're a first time contributor or whether you're a veteran contributor what do you kind of look at and what is important to you when you are thinking about these communities so you can feel free to add your responses over here as well okay so what do we need to change to attract more users and contributors yeah that's a great point so you also want to see how can we improve communities right so what are some things that you should focus on to improve so that's definitely a good thing to look at what part need help to connect them better okay yeah so examples of use cases what's one next project that could be interesting exactly so that's something that we want to also gain from these kind of analysis right so you know that this project is existing but you also want to know how do we make this project bigger so I see somebody said how do you make it bigger what can be some other things that we can contribute to make it interesting so yeah these are all great insights and we hope to incorporate these kind of insights into our analysis further but yeah thanks for all these suggestions and these are some things that we are also looking at we're trying to understand what community managers are interested in what contributors are interested in what developers are interested in so depending upon your role you kind of look at communities differently so that's the reason why we try to get data from different set of projects and try to sort of aggregate them in this graphical representation with that I would like to hand it over to Oindrila she's going to talk more about these algorithms that we've implemented awesome thanks Hema so now that we looked at more of the questions or some of the goals that we want to achieve from these projects let's try to look into some of the more technical details and how do we get to you know assessing these goals so firstly to do that we researched some algorithms which are sent graph centrality algorithms and this is sort of one of the most important area of research when you're trying to identify key players or important players within a graph network so what do important nodes within a graph network mean so in terms of like in graph language important nodes can mean number one they have a lot of links or a lot of like direct connections to them or it can mean that particular node can reach other nodes in like fewer hops it can reach other projects really easily and the another meaning could be that it really sits in between projects so it sort of lies in between like the shortest path of different projects so we'll go over a little bit more into what this means for github repos but before we get there let's get into some of the algorithms that we researched the first algorithm that we looked into is page rank and page rank is one of the most popular algorithms which was again come up Google came up with this and this was used to rank like web pages or websites which are the most popular within like Google search rankings so in terms of using page rank for our use case page rank has a variety of use case like starting from social networks to molecular biology so for our use case we actually use these nodes since these different nodes represent different github repositories these edges between them are common contributors or common contributions so page rank can be actually used to detect the most prominent nodes within this graph network so however it was great when we tried to apply to some use case it was great at identifying like the important players like the most veteran or the most well established projects but it was not good at identifying projects which are important in relationship to the other projects in the community so for example if we knew a list of well established projects and if we are trying to understand that in this ecosystem for example containers what is another project which could fit into this group which wasn't really good at filtering out or finding projects in relationship to other projects so another algorithm that we looked at is betweenness centrality so this was also our main aim was to find influential nodes within this network what this algorithm essentially does or means is that it tries to find nodes which almost sit in between other nodes which lies in the path of the different projects within a network so we try to test this out on cloud native computing foundation projects so we applied this on a bunch of CNCF projects and I'll also go a little bit more into this later but what we saw was that the more veteran or the more well established projects almost had higher betweenness centrality scores and the other projects which were like new or which were just introduced in CNCF had like smaller betweenness and something which was pretty interesting which we saw with betweenness centrality was that this was good at also emphasizing on a nodes popularity in the context of the repositories which it is a part of like the ecosystem that it is a part of the next algorithm that we looked at was closeness centrality and again this is also a way of detecting nodes which are closest to each other so this can be used to find nodes which are at like a shortest distance from an existing node so we essentially use this to find nodes which are most well connected or the closest to a well established project so that we get a better understanding of projects which are almost like interdependent on each other so these were the key centrality algorithms that we looked at so now we wanted to now that we have an understanding of these algorithms we wanted to apply this to certain use cases and try these out on some real world cases that we can try this on so one of the first use case was identifying open shift which for those of you who are not familiar it's the enterprise version of kubernetes so we wanted to identify open shift almost as a downstream of kubernetes so in the years 2011 to 2014 kubernetes was a well established project and open shift was emerging as like a downstream of kubernetes and we wanted to trace back in time and see if we can detect those patterns just using centrality algorithms so we tried we tested this out on three control groups one was well known project so we added kubernetes and docker which are very much related and which were very established during that time frame and in the emerging projects we wanted to detect open shift so that's what we added in that field and in the last section in other communities we included certain projects which were also emerging and which were appearing in different like open shift outlets and which were coming up in those years like Apache Hadoop, Apache Mesos the Jetty project and so on and we wanted to see what these algorithms show us so the first thing that we did was we wanted to see during those years what the trend looks like for these projects so we particularly picked one open shift project and as you can see here we can also just quickly show you the real dashboard so we saw that like during those 2011 to 2014 ranges the pull request activity and also the commit activity and the contributor growth for this open shift repository was actually growing and we saw that this is actually like a prominent project during those years so we actually went back and presented these sort of data points from that Git repository in graph representations and we applied these algorithms to it so the first thing that we saw upon applying between the centrality was that this was very effective at highlighting Kubernetes and the Docker repos which were obviously well known so the red nodes and the blue nodes that you see are sort of the Kubernetes and the Docker repos but we also saw that the open shift repos were showing in the graph like the green repos are open shift it actually came up but the other community repos like the purple ones were insignificant although they were very key players they were not really important in that ecosystem and that's why that was almost effectively filtered out and the second representation we saw that this was even better at filtering out those other community repos so here we see that sort of Docker and Kubernetes are pretty closely related and then the open shift repos are also there in this ecosystem although a little further away and the second use case that we tried this I'll just go over very quickly was actually representing CNCF projects using graph algorithms and so cloud native computing foundation actually ranks projects according to different maturity levels so graduated are the most veteran projects and the sandbox projects are the more newer projects so we wanted to see if we can use these graph techniques to essentially represent the maturity of a project in a way so we tried like a subset of CNCF repos we tried around 75 repos and we saw that when applying the second graph representation that we saw before where the edge lengths are almost like the closeness or the degree of connection between those projects we saw that the graduated projects were sort of central in the repository and the sandbox projects were a little further away incubating projects were somewhere in the between which is almost very similar to the maturity delineation that CNCF does what was more interesting was the between the centrality scores here we saw that the graduated repos were the biggest blobs definitely more prominent than the sandbox and the incubating repos and without going too much into detail I will show the overall rankings that we collected from this so what we essentially did was all these different centrality algorithms we scored each and every project based on the ranks the nodes were getting and we normalized them and added all those scores to get like a final scoring for each project within that control set and here we were looking at about 75 CNCF repos and we sort of just picked a few top end projects within like each category that we already know and we saw that also the scores were sort of analogous to the maturity level of the project so the graduated projects had like a higher total score incubating projects had like a medium score and sandbox projects had a lower score and this was pretty interesting to see because this also means that such scoring or these graph algorithms can also be used to identify which projects can be next introduced in the foundation or can be used to better quantify these decisions that we are making or also can be used in context of an organization so yeah that's all we had to share with you today. In terms of some ongoing efforts we are also trying to extrapolate this work onto contributors or users so applying these sort of algorithms on the maintainer score contributors on a project and actually seeing the sort of key players for each project and we're also trying to periodically come up with lists of new projects that we want to introduce into the database so that we can analyze them further because our research is just limited to the projects that are in the database and finally we also want to prototype these like cool graphs and these plots that we saw into the Aspen dashboard that we saw earlier so that would make it very easy to like filter down and you know use these graph algorithms outside of a Jupyter notebook or outside of like a python environment just making it easier for non-technical participants of a community to drill down so yeah here is the project repo where we are prototyping a lot of this work and in that folder you'll find a lot of our notebooks again there's a lot of like documentation of this work and this is a collaboration that we've been doing with the open source program office at Red Hat so if you have more questions, have ideas feel free to reach out to us or open issues or just look through our work and yeah that's all we had for you if you have questions you can ask us now or also enter it in Slido if you're shy and just want to type it out it's up to you but thank you so much. That's a good question so right now we are relying on like the Augur database for which Augur's main it's a chaos foundation project and I think their main data source is GitHub right now but I think like the APIs are like very similar so if Augur is able to like ingest GitLab I think the analysis is essentially the same or if there's a separate database that we can fetch data from I think it's easy to just get GitLab data as well. Yes we are not directly getting it from GitHub because you know how API calls and API limits and everything is so Augur has been a very sustainable great source for us because it also has different tables and it really has a very extensive schema for like different kinds of like it really helps us go from like a project to another project and has great links in terms of contributors, participants and it's a good relational database schema that just helps us. Yes it's a relational database which we feed into for more unstructured analysis like definitely our analysis is more like non-relational, more unstructured but our source is all like SQL queries and then we ingest it into a Python environment. Reach out to us like if you want to open issues or we can talk further honestly there are more use cases like we saw earlier that we are willing to try this on so if yours is a community that you know you have ideas about and you want to gain some insights on we can work together and just spend something up. Yes. Yeah that's a good question so that's something that we are currently looking into so as of now we sort of looked at like a specific defined use case that we can test and validate but the next step is exactly to your point we want to do this internally at Red Hat so we want to also identify like what are the top security projects what are the top cloud native projects so that is something that we are going to do next to look at it from that particular vertical in a given organization so yeah good question that's something that we are looking at. Yes. That's a great question and I don't know how much of this I can go into but in general I think the we are a part of the emerging tech group and we work closely with the open source program office and usually the goal for us is to look at you know what are some emerging projects which can be useful to Red Hat or which can be a good good community for us to just invest in so the ideal goal for this is to inform those decisions and also for us to validate some of our investments in terms of you know quantified better and just see that if there is enough data backing if there is like metric backing or if there are like some indicators that we need to be aware of before we take the next step or the next action. Right, right. And also if it's any new like incubating project that's happening at the company just to sort of keep an eye on like are we having the right resources for it is the project going how we expect it to go what are the next milestones that we should be planning for and things like that so kind of looking at it from that perspective. Okay, we are at the end of our time so thank you again. Thank you. It's lunch time so if you want you can go grab something and food for us. There's a break. And also remember that the 12th we're going to be handing out links to the social then so if you don't get it you can't get it Okay, everything said. So welcome everybody to my talk. I'm excited that you came to it. Not scared of the parenthesis. Thank you. Thank you. Thank you. What's the parenthesis? Well, that's a joke aside. The talk will be about half a minute, half an hour long and I'll show in the beginning there will be a bit of theory some explanations and later on about starting and the half there will be practical examples. If you have any questions if they are short you can ask them in between if they are longer perhaps wait for the Q&A I plan to have 5 to 10 minutes Q&A so we should be on the longer side. My name is Adam and I'll be presenting here about superpowers for closure and closure script Who am I? I am co-founder of OrgPAD as you can see. I'm responsible for infrastructure and security there so perhaps some of the stuff that you will see you will think oh my god he is connecting to some production environment or whatever but that's fine. About Lisp, why am I talking about Lisp at all? Well closure and closure script are dialects of Lisp and there are many people talking about Lisp and the hacker news main page every second day is something about Lisp or closure just today I have looked there was a discussion why was Reddit 1.0 written in Lisp and then they switched to Python what is it all about? So I have put down some quotes about Lisp you can see the XKCD in the middle these are your father's parenthesis, elegant weapons for a more civilised age stuff like that many people are saying very positive things about Lisp of course there are some people that tell very negative things about Lisp and so what is the bus all about? You can see the quote by Eric S. Raymond Lisp is worth learning for a different reason the profound enlightenment experience you will have when you finally get it the experience will make you a better programmer for the rest of your days even if you never actually use Lisp itself a lot so about that enlightenment experience I hope I will convey some of it to you so what is closure about why should we have another language why should we develop stuff at all why should we think about these problems, why should we consider different approaches to programming we have many problems running and evolving informational systems among other things is that we tend to solve puzzles we have difficult constructs that we occupy our mind with then we don't solve the problems we solve puzzles but we don't actually solve real problems that customers are willing to pay for and as the systems grow bigger we tend to encounter a tower of bubble that we have so many different approaches so many different modules and everything is kind of interconnected and not in the right way it's getting difficult it's tangled spaghetti also when we write some languages we tend to discover some hidden semantics we discover that perhaps concatenating to arrays is not an array or concatenating an array and an object produces different things depending on the order of operations and so on so these are all like small puzzles again it's hidden semantics it's a syntax that we don't intuitively know what to do with and the problem is that we don't have any predictability so we cannot go to the boss we cannot go to the customer and say ok we will finish our job in like half a week and then we will do some integration tests and so on and it will be all hunky dory we just we don't know that this we are not sure we cannot promise and that is really bad and that's keeping the whole industry back so are we perhaps not using the right approaches do we have the right foundations well that's a good question well there are some approaches well known usually more in the functional programming space we try to focus more on the what not the how we don't tell the computer exactly each step we are not micromanaging the computer we are more telling ok this is like the overall idea I want to have and that is obviously taken to the extreme with the AI bubble right now we don't even know what the computer is doing precisely perhaps we are too much focused on rigid and mutable data structures that are perhaps good for performance because the processor can work more efficiently using this but we as humans are not very good at understanding these small places connected to somehow to another we don't we are not good at shuffling bytes and bits as humans and obviously we all the time we have problems with state management right concurrent programming is very difficult and somehow we don't have any good tools to cope right so what can we do I would like to focus on what is actually data and what is code so data is information we don't it's information in itself it has to be processed convey some meaning right and it's especially data doesn't instructions how to do something it's not we can record some instructions but for them to be instructions we have to do something extra right for instance run this as code so what is what is code yeah well that is the set of instructions I was talking about so that is the 0.5 in the Murray and Webster dictionary instructions for a computer or within a piece of software so yes that's about the definitions you are probably not very not much smarter right now so I try to approximate or show some examples what I mean well we have many complex systems and we have they run and we don't usually change much in those systems we configure those systems so to some degree it is foreseen by the designers of the system that this will be possible somehow if you configure the system correctly you will achieve some goal and so you have some redhead technologies for instance like Ansible and of course you have you have some configuration files like this right so that is quite understandable but what about web browsers do they have any configuration what do you think well HTML is kind of a configuration for a browser the browser has to render this based on the input but it's a configuration so to say if you are in the Java world then you probably know it's a build tool and also it has a configuration of course so what so is this data or is that code well it's an instruction how to put pieces of software together right but it's represented as data it's not Java code it's XML so there is some duality to data so what can we do about it well we could for instance think about it in the context of compilers another big system that we configure by putting in a stream of characters out of which a program gets compiled usually right and that is very complex well we can perhaps think about some other ideas to represent code and data with the same means this is a joke about Lisp and parenthesis heavy syntax so I want to introduce you to Clojure finally what is Clojure so Rich Hickey the author of Clojure Broad on the website Clojure is a dynamic general purpose programming language combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multi-threaded programming Clojure is a complicated language yet remains completely dynamic every feature supported by Clojure is supported at runtime Clojure provides easy access to the Java frameworks with optional type hints and type inference to ensure that calls to Java can avoid reflection Clojure is a dialect of Lisp and shares with Lisp the code as data philosophy and a powerful macro system Clojure is predominantly a functional programming language and features a rich set of immutable persistent data structures Clojure offers a software transactional memory system and reactive agent system that ensures clean, correct and multi-threaded designs so what's this all about so dynamic what does it mean well there are no explicit static types you don't work for the compiler the compiler works for you general purpose programming language it's suited for application programming for all kinds of real time systems web development right perhaps it's not suited for driver development but if you can imagine writing drivers in Java you should surely could write them in Clojure as well so that's about it the interactive development of a scripting language well it's dynamic so it's much less code it's not as verbose and therefore it's much easier to type interactively and do something efficient and robust each time you write something in the wrapper or evaluate a program then it gets compiled down to JVM code so the efficiency should be similar to Java depending on how you write and everything is supported at runtime so everything if it wasn't it would be very hard for the interactive development because you wouldn't know what you can evaluate at runtime and not it would be very difficult it would be non predictable right and what it means to be a dialect of Lisp I will try to answer in the next slides so the language has some common literals I think you recognize most of them however I want to point out the last four lines the nil, ratio, symbol and keyword well nil is similar to now however in the closure script dialect you know in javascript you have now and unknown right and so closure script doesn't have it and both is nil ratio is very useful if you want to represent fractions exactly so you can basically differ losing precision to a later point symbol is useful you can think of it to some degree as a variable it's something that can hold a function that can hold data it's a name for something basically and the keyword is something like a label so it represents itself has some properties of function in certain places and especially a useful thing is that it can be namespaced as pretty much anything in closure so you can if you represent users then you have some data structure for instance if you retrieve a list of users from a database then you can say okay I get all those users as a sequence of maps and each of the maps has keys that are named user name user last name, user date of birth and so on so I have spoken about sequences and collections so there are four the typical list, vector, map and set they each have some interesting properties very useful for programming and I will show them in a moment so so how does a function call look like well you see it's basically a list that has two elements the first element gets evaluated it's the print function print line function and the second element and any other element would be arguments to this function but you could view it as a list so how do you define a function well again it's a list has five elements first is the definition then the symbol that holds the function then a string that is basically a comment or not a documentation string basically and then a vector of arguments here you can see that is one argument and then there is a function that gets called it's a string function so it basically concatenates hello and then your name that is the same as I said so what is great about consistent and predictable syntax and semantics because everything has a data structure so that is very simple it's easy for the compiler to evaluate it the whole parsing is basically 800 lines of Java code and that's it so that is the first stage of your compiler more or less you have persistent data structures what's that all about I will show in the example so because we are a bit short on time I will change to the example just let me set up I'll connect to our testing infrastructure and I have the sht in basically we have running application in the staging environment the application server and I'm now connected directly into that server into that application so that is quite common you can imagine that is like if you had a python program and you had the command line basically would be directly connected to the running so is that big enough for you can you read it or should I make it bigger I will make it a bit bigger so what I will do so this is the replica I could write something here and it will evaluate and that runs on the server actually but that is not how you develop software you have an IDE you have everything in an editor and that is what I will do in the last few minutes so imagine you have some problems in production you have people connect to your server and then for some reason these connections drop in some circumstances you want to debug this but how do you do that you could do TCP dumps and you could do all kinds of things but why wouldn't you use your program to debug these problems you have everything that you need you should have all the data in the program so the idea here is we have Nginx and the application server hides behind Nginx basically it is a reverse proxy and so I need Nginx to give me more data so I can debug this problem and so I told Nginx that Nginx should put port information TCP port information and the round trip time into some headers that I will parse and work with this information on an ongoing basis so what I will do I will introduce just to simulate something I will introduce a new route and this is all data that is actually running in production and I will introduce a new route let's say it is netinfo and I will label it as utils.netinfo very basic I will take this label and of course I need to tell my handlers that there is a new route and that something should happen with it so I write utils.netinfo and then I give it a function I just give it the symbol basically so let's say it will also be called ok and of course the IDE tells me you don't have this function so I need to implement it somewhere and that's what I will do I have something that looks quite similar over here so I do the same for IP addresses so I will do something similar for ports as well copy the function name it differently of course and what does it do? well I know the header is called xforwardedport so then I get the request sorry I have two screens it's a bit difficult to navigate I get the request right here and this is something to it this arrow that is so called threaded macro so what happens here is basically that the first thing evaluates and then gets put here behind the first argument before the first argument so to say right after a function call and then this gets evaluated so it's basically a pipeline it's very handy because I don't need to read from the right to the left I can read from the left to the right or top to bottom so just to speed things up a bit I will parse the address from using some other utility functions from the request right so that is I have now the address I was looking for the port so I will write in the port and I don't have this function right so I need to implement that function to get the port so what do I do? I switch to the other name space and I see here is the function for how to get the IP address and I need to implement the port function so I will just to shorten things I will just from my notes copy the functions in you see that I basically look up the header and then because port number is always a number then I can just make it a number and not a string so that is what I do and these other things are basically the same thing but for the round-trip times for TCP connections so I can send it to my Apple so it evaluates and I have it available client and to see the ideal it gives me these functions request right and then I want to have the socket address and that is get address and in it address and in it port so what does this do? This is interop with Java so in that socket address is something that is built into Java that enables me to make sure I have a valid IP and port and it needs the IP address for that to work so I want to know what type of address that is if it's IPv4 or IPv6 that is just more information for my debugging so I do that very quickly in it address type and just get address and socket address right so that is just gives me the address format basically and now I need to distinguish what type it is and then I will just ask about what object that is instance and in it for address or write in it address type so if it's instance of in it for address then I just return 4 if it's instance of in it 6 address I will return 6 so now with all the information in place I can I can return something right so this would be a bit insufficient so what I do I will return it as JSON because why should I complicate stuff? JSON CLJ JSON now I can give it just a map of keys and values so I will get host string socket address my port that I was talking about that I didn't have before and I can give it the TCP round trip time and I just use here the other functions because I don't have to do anything else for that so you should be able in a minute to go over to oh and I need to of course change this to application JSON right and so of course I now can send it to my REPL and you should be able to go to a specific address and get the information right so let me send these rows there I need to evaluate basically all these namespaces and you see there is some state management already built in so some things need to happen in some specific order so that got all handled by the infrastructure for me so I don't know I don't need to know in which order it has to happen it gets evaluated from my source code and so just to check in the last minute we can skip here and basically I should be able yes right so this and every time I call it I get a different number you see so it works right and it seems to use the same connection each time but it just has some different parameters for TCP so that's about it a short example I think we are about done it's all quite fast I think any questions in that case I will show you one more thing I thought that was all closure I think for some of you it would be interesting to know how it looks like when you develop in closure script and just a very short idea I now have in development on my local machine this copy of OrcPed one OrcPage basically with one cell one image and now if I want to just get I am connected into it and if I just want to get some these units what they look like okay I need to connect to it so here you see that basically my Boromir is right here and here I have some image so the content is just some html image hello I am Eugene Slametnikov I am a stress developer and I am going to talk about modeling decoding in stress probably everyone here can hear familiar what stress is it's a diagnostic utility, a Cisco tracer but it's not only Cisco tracer it has some other capability although it's not so prominent and this is mostly the result of the fact that interaction between the business space processes and the Cisco themselves and stress as being defined and tracer tries to capture as much as it can in terms of this user space kernel attraction so some of the items mentioned here like despite your syscalls or your urine BPF of course themselves as syscalls but the issue with them is that they're not like behave like some other syscalls basically they do not impose any specific semantics in terms of the argument and leave it out to the rest of the kernel to implement like the famous or infamous example is IOCTL that basically has no semantics it's just a way to do something with something associated with descriptor but there are also like despite your syscalls that are pretty much a kitchen sink. For example, IOCTL it's mostly used for controlling flux associated with paliscus, but also it controls and seals associated with file descriptors and some of the kinds of flux are not entirely associated with file descriptor but the underlying file. The same goes for PRCTL which is abbreviation for process control and so on and so on so your ink is particularly egregious because basically it hides from this usual space kernel boundary that can be expected with ptl syscalls by introducing the asynchronous kernel mechanism that has to be expected in a different way. But we are here mostly talking about netlink and it's probably worth noting that quietlink is needed in the first place because as I already mentioned, there already exists syscalls which is IOCTL that basically provides ability for kernel developers for kernel to provide any kind of service associated with a particular file descriptor and by the virtue of issuing these file descriptors for example using special syscalls like PRF, SignalFDE, MFD and so on you can basically produce this file descriptor soon as a go and don't confine by those devices that are available via the dev. So IOCTL is a vehicle that is used for implementing many kinds of kernel interfaces like most of them of course device bound like various kind of devices like NCD, video for Linux, GPIO, NBD, RTC and so on. But also as I mentioned by the virtue of producing virtual file descriptors some parts of the kernel that are not directly associated with devices can also control the IOCTL such as SICOM device mapper and that is actually kind of problematic but the usage of IOCTL from both in the user space side and in terms of kernel implementation involves dealing with lots of different different kinds of problems that people most often are not aware of for example if you implement some IOCTL you probably want to you want it working on architectures that support compact processes the most notorious one is X86 but it's also ARM, MIPS and several others you probably won't have an ability to extend the interfaces and don't add new ones instead of adding new ones every time you need to add a new field or add a new flag for example because you forgot to check the remainder of the field and since user space can pass garbage it will pass garbage there and you can't basically use the remaining bits of the files and all this knowledge and any specifications that are imposed by IOCTL there are numerous guidelines like how to implement IOCTL how to handle compact how to handle extending the interfaces how to write these IOCTL interfaces and structures are used by these interfaces in a way that is extended issues with IOCTL people usually don't care about anything at all and that's hard code IOCTL numbers in their code so it brings some power so one of the issues that netlink addresses is basically trying to be a better IOCTL providing a better general facility that allows various parts of kernel to implement the user space interfaces by the way how many here do know what netlink is okay so I don't need to explain what netlink is so netlink does a lot of heavy lifting by imposing a specific protocol and structure basically mandating that every part of the message has its type and its length so and also the kernel part of netlink has certain facilities for parsing and verifying messages passed by the user space so even though it is a lot of boilerplate code it is still much better than IOCTL so many interfaces as existing one as well as the new one decided to switch to netlink and basically there is a certain shift in the usage of netlink compared to IOCTL for example and BDH has switched to netlink since netlink was historically created as part of IP route to also use this netlink instead of historical IOCTL interface and so so it would be nice for us to handle and it does since 2016 it's not a modern feature it's for several years it was implemented as part of the initial implementation was done as part of two google summer of code projects first by Fabian Ciron under the Fabrion Vascar in 2011 and then by changing our that has implemented has created a lot of implementation for handling netlink protocol and the netlinks as Yelinux and several other protocols yes and since then the decoding implementation in strays is maintained and extended as much as time well so a bit about implementation it's actually there is not much to talk about because netlink is pretty safe for our protocols don't have much peculiarities probably one major wrinkle that strays has since we have a different way of handling memory because we don't have the netlink messages in local memory but rather retrieve them from the tracy memory we have pretty elaborate error handling in this case and instead of handling the full netlink message at once we leave it piecewise and handle possible errors that may occur during this retrieval and handling piecewise so rather than rely on libanail and its implementations that basically allows you to pass a set of attributes and get a table where you get all these attributes passed we perform some kind of progressive passing using type key decoder tables so with regards to the testing it is mostly done the same way as with most of other parts of strays decoding capabilities which is we synthesize some payloads we want to check parsing for and perform specific syscalls on netlink socket this synthesize payload and check whether the message argument has been parsed is the expected way so here is an example of this kind of parsing as mentioned in this slide we have just a set of various markers that we use in this kind of testing because various part of various implementators of netlink interfaces uses various types and uses these attribute types in various ways and divides different hierarchies of attributes and as a result we have a tensive set of parsers and this is one of the simpler ones that basically checks whether attribute payloads that is interpreted as an object is passed properly by the virtue of trying to supply a shorter message, a message that is unreadable memory so we here have on the second slide basically unreadable memory which is denoted by its address and some successful parsing here is an example of using of strays output when it traces show sockets binaries that is part of IP route to program suite and here you can see that even though I try to turn it down a bit it's still quite elaborate because well netlink messages are quite elaborate and some of them have quite extensive headers so for the referential purposes the associated message is illustrated on the right but you can see here that we try to handle various kinds of attributes and various kinds of data paths like internet addresses yeah big and this kind of stuff the interface is not decoded because it's zero I don't know where it's zero but probably because the request doesn't have enough flux basically the same thing you expect from other parts so let's try to trace in terms of decoding capabilities so let's turn for more interesting part which is when netlink decoding is not so boring it's not so boring when kernel breaks something for example one of SogDiag protocols SMC protocol decided after successful implementation of IPv6 support of the protocol being able to be tunneled on top of IPv6 decided to supply this family it's tunneled upon as part of InetDiag header which is used by Astrace to disturb which protocol the InetDiag message associated with and basically it may be impossible for Astrace to decode this protocol correctly anymore the funny thing is it's mostly when unnoticed initially because the main user of this netlink protocol which is SS doesn't implement dumping of sockets for all protocol families at once so it doesn't need to discern between messages belonging to different families but rather it performs dumping for each other's family separately it just ignores this family field so another part is that most of the time you can understand how what should be interpreted by looking at the attributes type and know where in the message you are but it's not always the case because sometimes these attributes or attribute hierarchies are protocol specific or address family specific and one way some parts of kernel implemented is to provide an additional kind or address family attribute that tells what kind of address family it is which works nice when you just pass all the netlink message at once but doesn't goes well with progressive parsing so as a result you have to perform some context tracking and pass this information between the decoders all the parts of the kernel that provide this information about type of about the protocol and this family is that attribute hierarchies associated with first provide this information about protocol and then the rest that is protocol specific so far it works well but who knows how some implemented decides to use it because other way to provide this kind of information and basically use this protocol as a type of the container master attribute unfortunately at least one place that provides this information this way it botched because almost all families except one do exactly that except for afbridge that decided that well it has to be special and doesn't provide this character and unfortunately it can't be fixed because hierarchy is part of UAPI so there is also some minor for example as I mentioned since netlink is a better UCTL is supposed to allow netlink to the surface implementers unfortunately all this goes to way once someone decides to pass structures as is as part of as per lot of netlink attributes which brings back all the issues associated with extending attributes maintaining compact compatibility between between alignment of not naturally aligned fields or different architectures and so on so despite the fact that it is more or less known and these mistakes are less often than they used to be they still happen from time to time luckily for S-trace most of the time it is possible to at least discern between various versions of the structure based on its size which is not always the possibility with the UCTL because some UCTL implementation some UCTL interfacing do not populate their size properly in the UCTL request number yeah another fun fact that some parts of netlink interface implementation just ignore the fact that netlink attribute type field is for types and use it as a ray index yeah currently we have several protocols supported in S-trace probably most prominent support is netlink route I'd say that almost all message types are supported almost fully like we have some issues with here and there for example some protocol specific attribute characters but yeah we are mostly associated with tanks and SogDiak is not actually fully supported because there is like one attribute that is not properly decoded that is associated with protocol specific circuit information but yeah it's a good shape at least for this several protocols there are of course many more but so some fun statistics I already compared netlink and UCTL several times and it's pretty much comparable in terms of decoding implementation by UCTL decoders by far like most extensive set of decoders present in S-trace like they account for more than 10% of all S-traces code and basically netlink decoding support actually is quite close to it so as I mentioned like we have quite extensive netlink route support and one of most prominent parts of netlink route is netlink route link protocol of set of messages and yeah it's basically the third largest file or second largest file in S-trace codebase right behind the main S-trace file itself and UT the file with UCTL files what pertains future plans basically area where S-trace significant liking is support of generic netlink several attempts have been made to support it and coincidentally we have gotten yet another request basically this week so it's probably can be addressed and paved the way for adding various generic netlink protocols into S-trace another big area which is probably quite important is lack of proper netlink filter decoding because the IP part of IP route support match the TCE and the filter type and things I think the code was finally moving to some machine generated code and machine understandable specifications it's probably would be nice to support them similarly to the way we support generation of UCTL decoders based on these color like specifications and this is probably it so any questions yes please no so it's a recent like for the last two years why NL is a scheme based schemes for various netlink protocols that are written in YAML so there is like one part is describing existing netlink protocols for example there is a lot of effort in fully describing KTH2 and another part is creating the parcells and hullars for this netlink in the car because well as everyone probably knows that netlink decoders netlink parcells in the kernel and also boiler-tweet code that don't well that can be easily generated I would say if you have any other questions so good thing if in the kernel they would use this generated power interfaces this would reduce the number of these audiences in strays F2 work around and what could be done on the two condensed kernel developers actually do this well it's basically the same way as with RECTO like the first several historical netlink protocols are written and have these peculiarities and we basically have to live with them but as the subsystem measures like it's more streamlined and newer for example generic netlink actually has some provisions has strict specifications how to handle structures, how to handle arrays, this kind of stuff and provides specific netlink schemes that allow this implement in the same way in an extensible way so basically it probably will be the same as we have with IOCTO that is historical and contains all these peculiarities and we have some generated part that supports the newer interfaces and newer protocols with regards to the output it doesn't there are a few examples in the slides I must admit it doesn't yes almost all strays output is currently tokenized there are like several remaining bits but there can be definitely done specific push at least to coverize that and the next step is probably to cover additional information in terms of how the next step is probably to provide additional information for output generators that can be used for producing more structured output we actually have a public as for that but again when nowhere probably third or fourth time it's gradually getting in better shape because like when it was time attempted in 2015 it was basically rewrite of each decoder in a structured way and it was well unsustainable like right now the latest iteration basically is pretty close to what kind of done in a way that actually upstreamable at least well tokenization and colorization can be done we can already almost everywhere like there is some weird parts for example device map decoder decoding as 390 specific syscores, ptrace commons that are not very well tokenized but otherwise it's pretty much it's streaming token at least thank you very much and if you're inclined there is some power fernalia stress them so you can take it morning I was in another room and I explained what is Podman desktop and what does it do with container engine not only Podman because it works with lima and docker and now I will go a little bit fast forward on all of these questions and try to do stuff with various Kubernetes break stuff maybe some live demo will not work but that's okay I'm super stressed for the live demo but that's the most important part so why we started Podman desktop there is one thing is that docker docker desktop changed the license and it opened the possibility to start a new project that's one thing but also it's that with docker desktop and docker compose you can start up in development with something that you cannot easily deploy in production and even worse if it's Kubernetes it may not work and if it's open shift you're almost sure that it won't work and so the idea is to shift less things rather than having a container that works in development that doesn't work to give to the developer the possibility to test as fast as possible that's what they are doing is working correctly on Kubernetes open shift okay this is another very beautiful thing I'm not able to do a beautiful picture like that so that's the idea and the idea is that it's not only controlling the container engine, Podman engine or Lima or docker but it's also playing with Kubernetes and that's you have just one click or a few clicks to do to transform what works on your container engine into something that works on Kubernetes so that's the old dashboard put in very small to see everything one of the things that will happen next is the design of this dashboard yes our reason to do that is containers are Linux so you have windows containers but nobody use them and when you want to run containers in production they will run on Kubernetes and they need to be Linux containers and on Windows and macOS it means you need to run a virtual machine with Podman edit or with a container engine in it and that was not easy so the first task of Podman desktop was to build an installer that helps you install, initialize everything ah it's not here so we have installer for windows and macOS so we have maybe five different installers for windows and we have one installer for all Linux and for windows we have an MSI an executable that you don't need to install choco installer, winget installer and fully air-gapped installer so a lot of options for macOS same thing we have installer that works on multi-arch and then installer that works on the Intel or on the iRAM so that was the first stages of Podman desktop that was being able to distribute things and to install Podman desktop and Podman everywhere that's done and it's stable let's say it's stable, it works we have some bugs sometimes but most of the bugs are not here anymore and then run containers so you can pull image, push image build image configure registry and you can run the core compose for the people who are behind the proxy we have options so that was something that we built in January, February when you configure a proxy you can add your own image registry and you can install in air-gap environment overthink if you have already Kubernetes configuration you have access to it if you already have your remote Kubernetes cluster that's already available in Podman then the basics I will go fast because I spoke of that this morning so basics creating a Podman machine, pulling an image starting a container, building an image if you want to go to the slides you can look at it and you have small videos for everything I will fast forward on that that was the session of this morning so this is where I will start from now and I will try to make demos that work so with not familiar with pods, well everybody knows so I will start from the situation where on the container engine I have pods running and now that I have that running on the container engine I want to push it to some Kubernetes and if you have already one Kubernetes engine configured you can use this one I want to run my own Kubernetes but what can I do what can I run and we have started to create some extensions that help you initialize Kubernetes clusters the first thing, the developer sandbox it's an OpenShift cluster managed by Redat and some developer tools like the Redat builder, OpenShift and S2i, these things and a lot of restrictions so you cannot be an admin after 30 days your sandbox is disappearing, you can only work on one project, you are limited in RAM, you are limited in storage and if you start a pod after 12 hours it's skipped so that's a trial demo trial mode, you want to start with it but you cannot really work a long time then you have the option to install OpenShift local and then when you have OpenShift local you still have to choose between two versions of OpenShift that you can install you have one version that is OpenShift and you have all the features you need quite a lot of CPUs, quite a lot of resources but at the moment that's what is working well, the best, if your laptop is strong enough and then we have MicroShift that's only networking in grey storage so nothing else from OpenShift let's say that's like the container plus networking in grey plus deployments but quite restricted and that's a very cool solution but it's very young so at the moment we have some stability issues with it I believe the first version that's working has been out 2-3 weeks ago but that looks great for the future if you don't want to, just because you want to test things that work on Kubernetes and you don't need monitoring, that's a good option and last option is Kynes Kynes is just Kubernetes running in a container and that's the first thing that we built, so in Poland Desktop that's the first plugin that we created and it should have been the first demo that I made but it's broken on my machine it's broken and I cannot start a rootfull portman so a rootfull portman is completely unstable on this laptop and I don't know why developers and bugs should go to the video or do the live thing so hey developers and bugs that's the console of the developers and bugs and that's the console for OpenShift local, so they look quite the same so developers and bugs I have one project I cannot create a project and now it's empty I'm not cheating and now I will switch context to my sandbox first showing that here I have the sandbox in unknown mode, that's not good good, good, good, it doesn't work I will do the demo afterwards then the problem is I started my laptop without the network and things are broken so let's start with the demo on OpenShift local and then I will restart portman desktop and hopefully it works so on OpenShift local you have two profiles admin profile and developer profile last time I tried with the developer profile I could not do everything I wanted so let's try safe with the admin profile and on OpenShift local here I have the admin and let's try something that I've never done is how to create a project so now I have a project that is entirely empty and entirely empty where is the rabbit, not in my pocket and I have my port now and I will deploy to Kubernetes and now it's asking me to deploy in the Kubernetes context and I can choose my project which is the last one and let's try it it works so I have my port running here I can see it in the console if I look here in the portman desktop I see that I have this it's good, big enough this port is running in the Kubernetes in the context CRC admin the namespace is not displayed here so that's some of the things that may change in portman desktop at some point is that the interface is super simplistic we don't show the namespace the context is in the cstray and we don't show deployments we show container, pod, image storage so for an advanced usage it may be not the best tool but if you want to move things from your container to Kubernetes we thought even thinking about it and then you give the work of tightening to other people so I see my port here on the Kubernetes I can see the logs of all my containers that's good so search doesn't work here but here search doesn't work so that's one of the hidden features that I have to show is if you click in the contents you do CTRL-F and you can search in the content, that's very useful now that you have something that you know that works on Kubernetes same thing you can copy the YAML and you put it in the code that needs to be deployed maybe you have to clean up some things like the IP address, I'm not sure you have to put it in the code but you have already something a good entry point another hidden feature when you go to the summary you see that you have your containers and the question is how do you access to the app you deployed because I believe that will not work I'm not sure it's the right one let's just stop this one that was the application on the container on Jaina that's one of the things that are still a little bit tricky now I have my part running on my Kubernetes but how do I see the app and that's not something that is easy to do at the moment so I had a question this morning about the evolution and the things that need to be better this thing for example this is leading me to something that is not the right thing at the moment the only way that I know is to go to networking in Hoot and then you have your application that works I can take a question at this moment on yourself no question installing OpenShift local let's look at this that's not something I do live demo it takes much time I asked someone already try to install OpenShift local before or not and was it simple how much time do I have? not minutes before the questions or before the end? before the end so two minutes before the questions you can take okay damn I have kind and okay let's try let's try to see if I quit OpenMan Desktop start it again if now my dev sandbox is fixed windows I forgot to say I've been using Linux for 25 years and for this project I'm running Windows so so it's still no or what? it's question time so okay let's do this and I'll recreate one so the name will be my sandbox I'm already in it so I have just that's the sandbox copy the login command login with the sandbox you will see my token aha I copy this line so that's the things also that are now that require still a little bit of manual work but it's very difficult to retrieve the information automatically maybe we will manage one day and I paste my login command and now is it working? it's exactly the same as before it's just just I started without network so now I change my context to my sandbox and I will so I change my context so now I don't see I don't see the pods and the containers that are running in my other Kubernetes context so sometimes it's confusing so you always see the container in your container engine but you see only the things that are in your current context for Kubernetes so now I don't have access to the other one I will deploy this one to Kubernetes it's exactly the same thing but it's in my sandbox and let's see how it works if I go down I have my pods running wonderful but this time it's not local it depends on all the network now I can take questions or I can fail instantiating a kind as you want question how much in your own speaking has pod been made in the Windows world? can you so how far has the awareness got regarding the availability of pod men somewhere else in the Linux if I get the phrase not making the availability of pod men in Linux compared to others? no no on other platforms I'm talking about like pod men in this field is not just in the Linux right? so pod men desktop is more for Windows app exactly I have to repeat the question here that's it the question is pod men desktop is more targeted as Windows and Mac users than Linux we have metrics on that I'm not sure they are really good because you have to you have to allow for metrics collection at the beginning we have very few Linux users and then let's say two third windows one third Mac and two persons Linux so that's really not a lot Linux but we believe also that the proportion of Linux users to say no to data collection is higher so that's what we think and we think also that on Linux you can install the text user interface you already have it so you don't you don't need to install pod men so it makes that people who use pod men desktop just to install pod men are not users on Linux so on Linux the only reason to install pod men desktop is you want to do this container to Kubernetes which is really a cool feature or maybe if you want to visualize to visualize your containers I like it because it's really nice for me to visualize things like as container pods images and I have an overview of everything that is there I like it a lot to prepare this presentation I have destroyed my pod men machine 10,000 times so the fear of starting from scratch and starting over is gone and it's on Linux I never put my image so I do it when my disk is full I don't have the same way of using pod men now on Windows and on Linux before so it's changing the behavior I don't know if it's for good or not but I tend to be more tidy with pod men desktop because it's cluttering the UI and it's like I don't need this image click click click click it's faster than just writing the command line or making a script to say all images I didn't use since two months and there are also quite some features that don't work well on Linux for example the proxy the future that is made in pod men desktop is made to inject the proxy configuration into the pod men machine using the pod men machine API but on Linux you have your local pod men installation so you don't use that so you have to edit your file by yourself so that's something that's really now I'm running Windows because when the feature doesn't work maybe it doesn't work because it's on Linux because it's an edge case and I would say I am super happy with it but that's like it is of our question out of time or not yet 3 minutes you have more questions I can fail a kind installation if you want most probably it will fail on Windows when you want to create a kind cluster you need to have a rootful pod man so when you do an installation normally it's rootless just if you need to run kind you have these kind of errors and since this week so just this week on my machine I cannot start when I start a rootful pod man machine this machine is just completely flaky so I cannot see the containers I don't know what to say yeah yeah maybe and someone asked me this morning why so what I think about CLI in Windows so in Linux it's straightforward in MacOS it's straightforward in Linux you have different terminals but it's the same shell and then we can talk about Bash they say etc but in Windows you have to choose which terminal do you want do you want CND do you want PowerShell do you want a Bash in your Linux virtual machine do you want Git Bash do you want PowerShell but in administrator mode so that's just running CLI tools in Windows it's still a mystery to me it's like ok here I can run this command AI it works well on Git Bash but then it doesn't work on CND it it doesn't work by design so Windows users are not not doing shell because they don't like shell it's because that it's just it's broken it's a I'm trying to do I'm the tech writer I'm trying to run Antora and to do usual tech writer stuff it doesn't work out of time my name is Rono I'm a software maintenance engineer at Redats in 6 years and if you didn't guess from my accent I'm French so as a software maintenance engineer at Redats I specialize in and I'm dealing most of troubleshooting customer cases that are related to user space so services that can be systemd services systemd yum yum dnf ercy's log so a lot of things and usually to troubleshoot issues and I will tell you why but because this talk is for beginners I will go very simply explain you how stress works when stress can help you and when it cannot help you and basically we go through dissecting what stress outputs and I will go through examples of 5 or 6 cases where you can use stress and how to use it and finally we will wrap up with some acetylenex integration stuff so what is stress so for the guys that already followed the stress station they know so please be quiet stress is a tool that enables you to see how your calls are executed by your program your user space program and prints back the value so it works you can have many many options below are the options I am using for example I will always using dash f flag to follow forks that is when a program spawn children or spawn threats I am using TTT to print values timestamps in human readable format so some hour, minutes, seconds milliseconds and then delay in the C-score I am using MV because I like to have decoding much decoding of the C-scores S to specify the buffer size and YY to decode more stuff basically and when you want to attach to process I will show you that later of course you can use p flag so stress is I said it's all about C-scores but what are C-scores basically so C-scores are a way to interact between your user process and the Linux kernel there are other ways but I won't I won't tell here so basically every time a user process require access to a resource it chooses a C-score internally and this C-score is wrapped in a glibc function basically for errors checking out other stuff for example when your program which can be C or Python or anything calls open you have it's internally somehow C is open which is a C-score associated to that usually there is a simple mapping but not always for example when you create a child using fork it's not fork that is used anymore it's more C-sclown but you can have a list of all the C-scores available on your system using C-scores to man page so when I put that in brackets that's the man page the section and you have a list it depends on the architecture and when you want to do more about the C-score so I just show you that you just use man on section 2 and then the name of the C-score without the C's underscore prefix so how does it work a stress so a stress uses the P-trace interface internally to somehow set a break point on the process that you are monitoring that we call callee so every time a C-score is entered the callee stops and stress gets a notification stress collects the data it wants to record for you for later and tells the callee to continue the C-score happens in the kernel upon returning the callee stops again and stress collects the rest of the data for example the number of bytes it wrote to the file system and stress prints you back to standard or usually to a file some nice lines that will help you troubleshoot one thing to note is that you cannot have two callers of the P-trace interface at the same time that is if you are already attaching the process to GDB you cannot stress the process you attach two usually it's not it would be nice in some cases but well it's not the case cannot do that so when a stress helps but basically it helps when C-scores are involved if there is no C-score it won't help you you have common hangings by hanging we say you start something and it waits forever you don't know where it prints nothing so that's common hanging program waiting hanging that's almost the same but usually it's more when the network is involved for example you can use this trace to find when you don't know a program to find which file is processing that can be very various things it can also help you see which libraries are linked to the program for example if you if some customer set LD library pass you will see that other library can potentially be be open and be used and that make your program fail later and also if you know nothing about what you are trying to troubleshoot you can also perform some kind of reverse engineering of the communication between the program and the rest of the world last thing is you can sometimes understand what triggers some specific error but usually you have to check with the source code and try to first to match where the source code is and get more so more use cases that are the use cases I mostly dealing with so as I said a stress is of no use if there is no cscol at all so if it's doing if your program is just spinning on the cpu there is no cscol involved for that it won't be of any use if a cscol is not returning from the cscol so hanging in the cscol it won't help much it will just tell you that when this cscol with for example this file descriptor this file is being called you hang forever in the cscol so in such case you have cscol tools such as traceCMD that can help it's more touchy and it's cscol things basically stress may not help you when you have some issue related to some race condition for example you have your program with multiple threads and something is misbehaving usually stress will hide the problem which gives you a tip that potentially you have some issue with some race condition but that's all of course when stress is of no use and when a problem exits just because it wanted to exit due to some internal computation it won't help you because you need always cscol to be involved basically so recommended usage it's not really recommended it's my recommended usage that means that when my colleagues ask stressis to customers to analyze the stress by themselves they send it to me and I like that they use these flags basically so stress has a lot of flags I can't tell how many maybe they are useful stuff but not for me actually I didn't know we hold the main page of stressis in 6 years because it's too long so there are too many possibilities so when you want to stress a command and its children you use CYY for example dash O will store the result to file dash S which is 32 by default it's how long the strings stressis collects it's collecting it's 32 it's too small to be usually now I use 128 here I put 1K it's good enough but sometimes it's well the larger you specify this the bigger the stress will be and once you have gigabytes of just one file of stress to analyze it's not that optimal so usually to my peer I say 1,024 and it's good enough you can stress so the first case was I start my command under stress the second case is I want to attach a stress to a specific program the difference here I'm not sure that will work so it's using dash P dash P you can switch to multiple processes just be aware that if you attach a program and the program already spawned children before of course you won't monitor the children when a process gains root privileges for example you do don't forget that to run a stress as root already otherwise of course you won't see anything once the program became root so for me for my job that's easy I always do a stresses using root whatever the problem is so now that we have the basics let's see the output so the output depends on the flags you use so for all these outputs it's FTTTYY so first you have the PID the PID is a thread ID of the program that you are expressing then you have a time stamp with hours, minutes, microseconds then you have the c-score name and the c-score parameter and then at the end of the line you have the equal sign means the term and you have the result so the result is dependent on the c-score can be 0 can be something else depends on the c-score and at the end, very end, you have the time stamp so I think this is a big capital T here but I know that so here for example in that example we enter the people c-score which was written on some descriptors and it returns on time out after 900 milliseconds for those that use a VI you can set a stress file type so that you get some nice colors that's it, so two examples here from time to time you get why not, it's not from time to time usually you get unfinished resume thing so basically in the previous example here we had the c-score on one line one after the other but it's far from being always the case for example you will see this kind of thing so the PID is a time stamp the c-score and unfinished and later in some other line you get the result so this happens when you have a Mrs. Freddith program or you are monitoring more than one PID so that's completely expected but it can make analysis difficult I think there are some tools available on the sTrace website to pack things etc but honestly I'm never using that sTrace so many c-scores return minus one minus one is usually an error is that is that bad well it depends there are many c-scores that are designed to fail in error for example when you try to access a file and the file doesn't exist it will return minus one and set the erno and the erno is set to no supply of data so in a way so for usually the erno that you can skip and because they can make you think that reading sOS is difficult are e again, e intripped e restart c then e restart no hand in no hand no such file data, usually it's perfectly normal except for example if you try to execute an executable we have seen it's full and doesn't find it and so here in that case usually the shell will try multiple location if you use just full as executable and not the full bus name it will try various places where to find the program and in the end may fail or not another example the program tries to open some library which doesn't exist it says no supply of data but it continues so there is likely no issue real issues are usually when you get ePerm or eAccess which means I couldn't access a resource something I didn't tell is that sRace also catches signals and it prints which signals were received by your program so here in the example above we see for example that PID 6138 was doing a select so it was waiting for something on the network for example just sleeping and it got a sick kill so it was killed and that's it you have other examples with siktern siksecb siksectern it gives you so when sRace process is the signal that was received by the query it prints you details because of the actually I think so here typically it says that you have siktern we see and it was from the user space some other process and here it prints siktern that was kd1 so systemd and then how it was running basically so as would sometimes you get signals from the kernel for example so here that's the case where the kernel killed the program because of segmentation port and there was the program tried to access address nul so it was a nul pointer de-reference so you have a lot of things that are very interesting to troubleshoot basically so let's go to the example so I have 6 examples basically so stracing a command slowness so typically so I see that all the time on customer systems basically they execute a command and it's slow it works but it's super slow it takes 10 seconds with strace it's very easy to see what's going on so in that example here we were executing df and because cscore also knows the environment variables and all that stuff it prints the environment variables so let's give it and we can give an example that this df program had in the environment ld library pass set to some sap directory which is not an issue so you can check the main page from here and see how it was how to match between the case so here we have the pass name then the arguments f-h and then the rest is environment so it brackets here and goes to the next so when you see something this is a real example basically in the program so it was taking a long time 10 seconds to execute df and for strace we could easily see that it was processing the ld library pass because it tries to open uc in various locations which it couldn't find which is not a program because ld will then pass but the issue was on the time spent to find out that there was no find 400 milliseconds to scan a location at that time strace is for no use anymore but you know where to dig into basically you have to check why accessing ussap fails but takes so much time to fail and basically for no use it was failing here because of some ottoman that was happening in the background and breaking I see that I don't have the complete presentation too bad okay another use case is stracing ssh being slow or hanging when doing ssh you need to remember that you have two parts and you have sshd on the server you are accessing many many times I see people sending me straces of ssh at such ssh it just connects to the server and that's the server that does the job so always remember to strace the server instead usually I know also to strace the client so that it's easy to match the timestamps and to see the connection the port being used and stuff like that so how do you strace the server sshd well I give you some tip here basically I get the PID of the ssh server which is I strace I start strace I tell the customer to do the ssh and then to control strace once he considered that it was too slow basically and then I check it the other way to do that is to that means that strace with strace the connection of sshd all the connection even the one you are not interested in so sometimes I ask the customer instead to spawn a new instance of sshd just on a specific port 8022 for example and connect to that but I don't like that I don't do that much because the issue that you need no firewall and no open port and sdnx to be configured to allow that and etc and finally so we get our strace from the server and we check it so initially we search for accept accept is a ssh call to accept a new TCP connection strace shows you all the details it shows yeah the port on the local on the local system on the client and it's written to here ssh server and some lines later we see the clone so basically we have sshd fork a child to handle the connection for search for this search for clone and once you have the clone you know that you are interested in this process and it's children but usually there is no need to check for the children depends where the issue is but basically you extract from the biggest trace you have all the children the PID 23918 and all the children and then you can dig into that so I'm giving you some but I have better that's the thing if you use the ssh listening on internet port and just listening for one connection there is no clone because it's the ssh service just for one connection that will handle the the new connection so what we have here well that's a real example so sometime later in that a trace of our sshd connection we can see that some message is sent this is a diverse message so this is to create a session for the user no issue with that so it's sending and then it waits for some answer the initial answer is again I have nothing for you because I'm in non-locking mode I return immediately and then we can see the code doing a people which is basically waiting for getting a notification on the file descriptor for which is used for the connection to the bus from there so we know that and we could see that the cscol failing time out after 25 seconds once you have all this you know that you are done basically there is some issue with the ssh to systemd to create a session it waits for 25 seconds and then it continues then of course you need to have to know some internals that sshd internally executes a PAM stack and in the PAM stack you raise PAM systemd which is responsible to set up the session etc so basically here there was no connection to create the session stracing sudo and su so program that becomes root you have to proceed differently because as I said if you are not root already when you execute strace you won't see anything once the program becomes root so what I do is I strace the shell that the user will use sudo command so I use equal dollar dollar to get a KID and then I attach strace to dollar dollar I tell the customer to execute his command but fails or slow usually we assume sudo assume which is a bit redundant but that's alive and we get some strace when you want to strace a debond for example cron cron to execute basically so that's very easy to manage things a command mistake is similar to ssh is when you want to strace a failing systemd service so I see people all the time stracing systemctl command which basically does nothing as for ssh it just talks to systemd and systemd does the work so to have something useful well you have to strace systemd so PID 1 so procedure you strace systemd you start the service you control the strace of systemd once you consider that the service fails and you check for you try to extract from the biggest trace to get because systemd was doing other things what is interesting so what is interesting when you have to find when systemctl was executed basically to say hey start this service well how it works is very similar to sshd you have some exit4c scroll on the socket on the unique socket and later you see systemd creating a child scroll and later again you see systemd so the child of systemd executing your service basically you will see as many execve as many children as they are exec start3 and exec start command I have better tools and then you can dig into what you want that was ancient time so I was doing all manually now I scripted everything so basically you get the children of you get the child of systemd and then and all the child of systemd you are interested into and then you recursively extract all the children of your service because it spawns you can easily check using the stress as well if you could see some process or some services being killed or that failed with error or no error but just break it so some people will say yeah you do usd systemd you get a mess and then you do filtering it's because well there is another way is to hack the service unit and just replace a systemd here by a stress your command and a systemd but that's bad that's bad because on rail you have a c Linux and because of that there will be some automatic transitions happening we have a stress being labelled with bin t and systemd execute as init t so when systemd force the child and start executing a stress and not a c log d you will become a stress will become unconfined service d and because of that a stress will then start a c log and a c log will run as unconfined service d which is not appropriate for a c log so probably your service won't fail because it's open bar whereas for a c log which is supposed to execute in c log d t context it can do less so that's why I'm never hacking the unit I'm just relying on a stressing systemd and then filtering 5 minutes is perfect so a stressing boot activity that's very similar to a stressing systemd basically so I do that rarely but from time to time I want to stress the entire boot activity when I have no choice basically I can see for time to time that a service fails to start at boot but once you restart it, it works fine why? it can have many causes usually it's due to when you boot you have no network that starts early you have no network or you have no dns resolution you can have the network but still no dns resolution and stuff like that so the easy thing to do is stress systemd as soon as you switch route so that you get everything of course your system will start slowly because stress has a huge impact but you get everything and then you filter so this is a small trick to do that you will mount because after just switching routes you have nothing so first you go to init-vsh so there you get the prompt at switch route by remounting slash with a red white and then I execute stress but somehow specially with capital D so that it becomes a wrong child why? it's because we want to have systemd be PID1 if we were just doing stress stress would be PID1 and this thinks that it works but it lacks because PID1 is special and is used to reap processes that have no parent it won't work so use this start it so you will have your systemd running and all the children all the services etc and once you can log in you have to kill stress forcibly otherwise it doesn't work I didn't check exactly why I think there must be some signal handling issue basically and last thing is on slinux integration Dimitri already said yesterday with recent stress you have slinux integration that means you can see that's very interesting when you want to learn slinux you can see transition happening when you execute services this is an example with rc-log when you put dash-sc-context with nothing you have very small things very small indication here we can see that the child of systemd that will execute rc-log which is basically id initially execute in the context of the caller which is systemd and then it tries to execute rc-log which is labelled differently with cs-logd-exacty and this results in the next line when exactly it finished this results to change context to cs-logd-t so if you start by hacking the service unit you will see that you won't get this at all you will become unconfined service t which is bad basically so it's on fedora 36 later rail 8.4 later and for ancient guys that do not have this I have on my public space some rebuild version of stress so later at that moment I did so it breaks a bit so it breaks because some decoding is not aligned with the kernels but that's not much interest and that's it I'm done questions looks like you are no beginners I just want to be sure that the ping unit is upstream right? it provides a link to a separate system that's just I use the upstream stress and just rebuild it for AC context yeah so it just provides that I give to customers sometimes when I need that on rail 7 for example no question? if you have question you are shy we can talk later so the question is I want to stress the boot activity could we do that for cell console well yes you can because I think the cell console is already set up in slash def so you can free it but well writing to cell console is never good because it's super slow and it's synchronous so writing to a file is nice and also the file will be huge because you have maybe 30 or 40 services running initially at boot yeah that could be nice as long as you back port it to rail 7 that's my I'm interested that's it I suppose team members support us here and there okay that's it thank you very much good afternoon everyone welcome to devconf check I am Rijin Umban I am Navinesh Inde we are part of developer experience engineering team at redhat yeah yes I would like to start with this quote in 2011 Mark Anderson quoted this in the Wall Street Journal he states that software is still eating the world so basically his idea was like day by day our daily problems are getting solved within the software everything is getting resolved within a software ecosystem yeah so there are a lot of innovations which are happening right now in the industry like chat gpt e-commerce applications a lot of things which are happening right now so how are we managing this how are we managing this innovation in an engineering perspective innovation is good all the time but how are we managing as an engineer managing the innovation means are you able to track the good things in terms of good practices and all so basically my point is every day that whenever the new innovation happens the stack is always the problem stack is getting more complex day by day and like in an engineering perspective software onboarding the collaboration management everything seems to be different day by day nowadays so it is good that we should resolve this kind of things actually there are a lot of problems coming right now you can see some collage on my screen in the collage I have tried to address some of the problems which I come in my mind in an engineering perspective when you develop a software deliver a software managing a software a lot of problems will come to you some of the problems I have addressed here like for example consolidation of the efforts that's a major thing so something can be the management something can be the templateization better software management a lot of problems are coming in the picture so in the current scenario many organizations want to actually more in the rapid delivery model with the less resources that's a lifestyle which we are into who is going to take care of us is someone okay so basically in our basically in the engineering perspective so as an engineer I have to develop some feature I have to build some code deliver to the production and I have to manage it effectively that's the basic fundamentals of an engineer like how to do the things in a better way so engineers are actually struggling to make the things engineers are actually struggling to meet the technical features and everything but we are not getting the right time so these are some of the problems which we are trying to address here right now and this is actually what we are doing to us right now we know everything as a developer I am developing the applications I am maintaining it I am putting it into production but that is not the supposed job of mine it is supposed job of a developer or a sister administrator who manages the infrastructure I am getting the I am being the jack of everything but I am not being the master of anything or in an engineering perspective I am working on everything right now that is the primary problem for the developers now so do we have a focused solution which we can see every efforts in one place yeah that is backstage so backstage is an open platform for building the developer portals which is homegrown in Spotify as an internal tool and it has open sourced in 2020 and it is donated to cloud native computing foundation Spotify enables engineers managers what is happening in your organization in a consolidated view and it is a huge community based also across the world this approach actually simplified the life of the Spotify Spotify has almost 1000 plus microservices and the software pieces scattered across their organization with the backstage they are able to handle it so this developer portal enablement helps them to track every pieces of the software which are scattered and everything into a one single view I told you the developer portal what is developer portal developer portal is nothing but an internet for the developers and it is one single frontend for our organization to handle all of our services or our software stacks so it unifies your toolings and it unifies your toolings services, software everything into a single stack so developer portal primarily integrate the tooling services and the documentation also which is a pain for us to manage also we will get to know who owns what we will get to know some insights on who is taking care of what so there are a lot of insights we will get after connecting the backstage with our software ecosystem so that we can give some time to our developers that you focus on building some feature we can track everything in one place like if something goes down or something goes there we will be able to track everything so we can make our developers to work on their focus to model backstage actually enables the better collaboration with the potential and empower steam to do what they do to do best with the speed and the scale and house control so backstage helps you to deliver your software and manage the software with the golden path practices so backstage it aligns the distributed culture of an organization and it helps bringing all together into a single platform the core philosophy of backstage it act as an interface to unify the things all together I have mentioned this before actually so it helps to build it helps to build a single source of truth for your organization and it will keep up your autonomy in your software stack like autonomy means different software, different enterprises works in a different model so within relation to that you will be able to see how your software stack should be set up over there like you will get some insights over there at that moment and regarding the ownership you will be able to set up the ownership for your backstage components for example if I have a front end and he wants the back end for the front end I can set up the ownership for myself for the back end I can set up the ownership for himself so we will be able to track that like who owns what also this empowers the responsible software development ecosystem also let's go through the three core terminology of the backstage which is core, app and plugin core is something like how the kernel is for the Linux so with the core like it is maintained at the Spotify so it powers the basic functionality for the backstage app is something like we do a lot of different things installation information how it helps in our day to day activities so app will just an instance of backstage which is deployed and we can take according to your needs actually plugins is a cool feature of the backstage that we can extend we can give the you can give the extensibility to your code so for the enterprises so backstage actually helps you to create manage and explore the things creating means that you can build your software stack in an effective way you can manage your software stack in one centralized location and you will be able to connect the pieces across it so by connecting means that your data is shared across the organization so that the discoverability of your tools and services will be huge so talking about the software model so this is a way where we classify our software components into one so in the score entity we are trying to classify our software right how our frontend system should work like how our backend system should work like so we are classifying it into our classifying into our software system like that now so regarding to that also there are many relationships also that we are able to manage like you can relate your software one component to another component how we talk so for example I have a backend system I provide an API so we can build the relations on top of that also we will be able to use the annotations to we can use annotations to extend the backstage actually so it uses the Kubernetes it uses the Kubernetes format to define the annotations so it will help extend your backstage and use the plugins also so it is it is a very good thing one thing we can do with the backstage is like with the docs so docs is always a main problem for the developers so managing the documents for example if I am an engineer if I want to find the technical docs user docs like I have to go to multiple places and the multiple resources so right now with the power of the backstage we will be able to find everything in one place with the power of the tech docs actually and you can manage your content in the terms of markdown and you can able to find your services with the power of the search search will be able to do search will be able to search will be able to make sure that your content is shared in the backstage so it is discoverable basically search itself it will help to find out the content so with the search by default it is a memory search mechanism so it is able to connect the software pieces across the system if you search for a software template you will get it if you search for a documentation associated with a software template you will get it so it is already there and good part of that is it by default in memory search but also it supports elastic search, lonar and posgras as their back end search engines and this is suggested for the higher level usage actually in memory is not a good in memory is not a good search mechanism in the terms of a large scale usage so switching to the search engines makes sense actually and it makes help to communicate within the service to service plugins and you can customize your search experience to your people so I am handing over to Navinia thanks Rajan so I will walk you through software templates and backstage so all of us are aware of this famous quote which is time is money and do you really want your developers to spend time building the same old boiler like developing the same old boilerplate code every time they want to create a new application no right so for this the advent of micro services every team has an autonomy to choose the solution which they like and which fits their purpose for someone it might be Java, for someone it might be Node.js so teams have got this independent decision on what they want to work on but this creates a problem called as distributed fragmentation of developer tooling because if you are working on Java someone might be working on Node.js so there is no common developer tooling and the only way you can know like how the developer tooling is set up is to speak with your colleague so backstage addresses this challenge with software templates or golden paths golden path is nothing but an opinionated and supported path for developing your applications with golden paths you can bootstrap your project ideas quickly by following standard practices like clean architecture so golden paths are not meant to restrict or limit any developers they are rather meant to be complementary to developers so that they can do what they do best like develop applications golden paths also allow to automate the creation of GitLab CI CD pipelines and deployment templates of open shift and Kubernetes so on slide I have included few links to golden path templates which are available from backstage as well as jynus idp jynus idp is an open source community supported by Red Hat it is currently working on backstage so you can check out the golden path templates available on backstage as well as jynus idp there are plenty of golden path templates available there like clean architecture, react spring boot with help chart deployment now coming to plugins so backstage has this customizable and extensible plugin architecture in software systems there is no one size that fits all every organization has its unique requirements and it needs a unique solution for that and backstage is really built in a way that it is open for extension by default and plugin provides a way to extend the backstage architecture so there are a number of plugins available on Spotify marketplace also there are some plugins which are provided by Red Hat and the links to the same is provided in the slide which is jynus idp.io slash plugins so you can explore all the plugins here now this slide shows an example internal developer architecture of an internal developer platform at an organization it has variety of plugins like Circle CI for managing your continuous deployment service catalog to manage all your services at a single place lighthouse for your application analytics and it's not necessary that you should use a vendor provided plugin in backstage it's very easy to develop your own plugin so all of these plugins can be used as a mix and match of variety of plugins and you can extend the backstage architecture as per your use case so I would like to cover this plugin of Motomo but an open source web analytics platform so we have developed this plugin in house at Red Hat and it will soon be available as part of community plugins in jynus idp now we will like to go through a quick demo of backstage so this is a backstage home page where you can enable where you can do all the things right now here so right now first of all I am trying to do is going to show that how to do the software templating and develop a software with the golden paths and all so for software templating it's the core of the backstage where you can enable the developers to follow the best practices and everything let's see how we can create a software template so we have defined few software templates in our catalog right now so I am using the software templates for the with this not just back the application so after you can add your organization whatever you need I can add you can add your repository name and you can who owns the system here comes the ownership of your code piece actually so you can define the owner for a specific code which is going to bootstrap and you can assign the system where it belongs to also and you can modify the things however you want so basically it uses the yml format to configure your software catalog software catalog for your application and you can define the actions which you need, which you follow once you do a create you can see that software is getting cataloged in this specific group so I am doing a live publishing to the software in the catalog so once you have done with it you can see the we can see a new rapport has been created just now which states that it is the code has been bootstrapped by the back stage so similarly also you will be able to track the software in the back stage which will give the life cycle we can track the life cycle for this specific software piece from the day one so this is how we can track the and if you are trying to reconfigure this with the summons you can also just update the catalog.yml so you will be able to track everything in one place so this is a one example of a software templating and everything is managed within the catalog so this is one of the highdp which we use right now hannas idp which we can utilize for building an idp so in the hannas idp showcase this is a showcase app which is available publicly it is available in showcase.hannasidp.io so in the hannas idp we have built a plugin for managing our infrastructure managing our security insights so reddit has a very great community called hannas so they manage this right now so this is a showcase application I am trying to show you the plugins which the hannas idp has developed recently so one is the majority topology plugin topology plugin is built on the top of the native kubernetes integration on the backstage backstage has a native kubernetes integration on top of that the topology plugin has been built up so you will be able to get the insights of an infrastructure and in one place for a developer to see what is happening if a port is down you can see that what is the status of the port then you can take the action items necessarily so another plugin is called the tecton plugin you will be able to see the tecton pipelines over here another thing is image registry by the kui so you will be able to see what is the image which you are using with the backstage with your application you will be able to see some of the image insights over here and also some security insights at last and also you can see how the system is depicted here actually how the system is getting related to how the system is related to here if you see that the showcase app is dependent on what all things so you can define the dependencies and as a new developer here she will be able to understand how the system works basically that insight is basically shared and if something is broken we can identify where this problem can be where the problem is arise and this is the documentation for the backstage tech docs powered so you can be able to manage the documentation in one place right now so with the documentation with the documentation you can manage the content over here and this is a searchable piece of software even if you see here you can search that actually you can get the guides over here so it is as much as extensible backstage there is no limit for this extensibility we can get from this backstage instance also it is able to track the APIs it can serve as your API catalog within your organization so you can able to find what all APIs you are getting used to within your organization you will get some insights on that right now yeah I am handing over to Adhavan so currently Janus IDP is an IDP provided by Red Hat it is an open source one there are also a lot of commercially available versions of IDPs like Red Hat Developer Hub which was recently announced at Red Hat Summit this will soon be available in GA there are also Amazon and Rodee who have built their own IDPs and they are using backstage as their central nervous system for managing this IDPs next one is let's say you have decided to use backstage as your developer platform and you want to use your IDPs create your own IDPs using backstage what should be the adoption plan so first step would be trying out backstage to try out backstage you can use the demo applications which are available on Janus IDP or you can use a demo application from Spotify as well the next step would be POC which would be you set up the backstage instance on your local environment and you try out like you try a few configurations like configure backstage with GitHub or GitLab the next step is building let's say there is some requirement for which backstage is not providing the plugin or you want to extend backstage there like we had for Matomo so you can build your own plugins so that is just a small feature where you just try out if backstage is really working for you and if it is the solution you are looking for if it works for you then the last step is spread the world evangelize your adoption plan and extension so we are welcoming contributors for Janus IDP you can scan the QR code which is shown on the slide to join this Janus IDP Slack community what is your question on that sir one more existing project yes we have to just add a YML file your software descriptor basically in your existing project backstage is able to track from the things from the writer there is a VS code extension is also available I think that is created by Spotify with that extension you will be able to create the template formatting and the technicalities you can simply onboard the software with that yes for each of those why it is beneficial for organization but why it is beneficial for the world versus the country so as a developer I am sorry just stepping can you answer your question the question is how backstage is beneficial for developers so as a developer you are a part of an organization like at Spotify they have 1000 plus microservices but it is shipped as a single product so we as a developers we need to collaborate with multiple teams like front end team needs to collaborate with back end back end team might need to collaborate with any another service so every time you can't go looking for who is the owner for this application what is their API like are they providing any swagger documentation so for all of those things they are providing a single pane of glass be it for developers be it for managers or for non-technical focus any other questions will you take mine yes you know application about application templates right like you are looking for yes so as one of the best practices like if you are speaking about Kubernetes maybe like I will write in the scroll who has the limits so how much more do I need to give is it possible to put checks inside those pane plates yes you are asking that can we put the custom elements inside a code repository right yes this is completely flexible this is built on the backstage this is in the plugin concept even if you feel so like based on the plugin it is completely customizable for in the case of software template you can ask for this check this is a custom built form actually based on the questions which you pre-define in your Kubernetes descriptor file object descriptor files it is a YML file so the references of this content on the slide and the things will be taken as well as the HANA CDP pages as the credits so hi everyone how are you guys all set up with your evening coffee I think all charged up with that caffeine shot ok so my name is Ashram Bhatti and I am a part of consulting services team based on in Germany so we implement all these product as a customer side and my colleague Moritz took this session because he is implementing AP somewhere but his heart is in the session and now I think let's begin with the agenda so today I would really want to keep this session and the slides very interactive with you all and at the end of after this presentation I would really like you all to go and play along with the demo which I have created for you so you could install it on your local laptops 3 VMs on any cloud and then install AP play along with it so today we will be covering what exactly is an Ansible Automation platform some of the good practices and security implementation for your Ansible Automation platform as well as for the hosts which you are trying to automate so we will be covering both the things and then of course configuration as code because code is important and you cannot do anything manual now I have prepared a small basic demo and I would try to show you some analytics around it but before we move on to Ansible Automation platform are you all aware what exactly Ansible does yes, no, maybe perfect so I would just say in our lives automation is incorporated on a daily basis earlier we used to set up certain alarms now we have an Alexa earlier we used to wipe and walk our houses on our own now we have robots who are doing it similarly Ansible is an open source tool which would automate your entire infrastructure and your ideal departments and Ansible Automation platform it's a box of goodies which comes along with a set of toolkit which helps you to automate multiple tasks and resources together in your infrastructure team so the first part is let's say for example what exactly you would want to do with the operation site you would have a UI which would be provided by your Automation controller and your Automation Hub which we are going to see later but that contains your own execution environments, your collections, etc to support that we would need the Ansible content creation so how are you going to create your Ansible playbooks we have certain Ansible content creation tools Ansible buildup, etc to which you could create your execution environments you could create your playbooks test them and whatnot and finally when you have all that in place you would need to visualize what is running perfectly what is failing and you would need a beautiful analytical page to identify and to view all the tasks so that's what is all provided by this 80 platform so now we are going to deep dive into some basics of Automation controller which we just saw and some of the objects of it so Automation controller as you see it's just a UI-based tune where you perform all the operations so where you define all your playbooks and it comes with a role-based access mechanism so whatever object you would want to give that's very easy second, it also has an API so if you want to call an API to create something or to modify any object that's also even possible plus it's like if you want to see centralized auditing blocks who they bought that's really important in Ivy who messed up my environment how can we resolve it that's also even possible with this and you could do a lot of things with this I'm just having some other basics now this project, just imagine that you have a set of objects which you would define with your Automation controller so your projects let's say for example if you have created and some playbooks and notes based on it you would just use an SCM extension to import all this code in your Automation platform controller that's just the building block and if you see this image especially this config as code desktop repository project code we would be creating it in the remote cell inventories so for example you want to start your own IT company or you have your own startup now you would want to do certain tasks on your servers on which your application is running those particular servers would be added or they are known as the inventory so they are the actual posts on which all this Automation would be applied on they are logically grouped and again it's all secure because you could define permissions who can access your inventory for this demo we have again created another project based only on inventory so let's say for example if I have 1000 of posts which I want to automate I need to group them logically and I need to tell them ok my development environment has 50 posts my testing environment has 100 posts and so on then job templates I would call it as it's a glue it's again an object which has everything so whatever project you define or whatever playboots whatever execution environments credentials they all stick to this job template and the best part is it's reusable so you create this job template once it runs your playbook and then you are done so you could reuse this job template again and again for your multiple automation tasks workflows so to make it simple it's just the combination of all your job templates so it looks like a pipeline but it's not it's a job template so these are some of the basic terminologies which we use in Automation Controller and trust me by now you are already one step closer to being an AV expert so let's talk about some of the good practices and security implementation for this so to begin with what exactly is wrong with this picture any guesses exactly and so when we start developing a lot of things at a later stage our codes start to get messy but if we start with an organized way or organized way of writing code it's much more easier to understand and it's much more easier to manage it so that's one of the good practices that you need to organize everything you need to organize your roles you need to organize your inventories your playbooks in a proper way that's the basic step in my opinion to implement a good practice in your team or in your organization another example is you need to create simple playbooks and inventories so if you see that's my playbook it's just a few roles which I have defined I have not written any complex logic over here I have not defined any classroom a grade or loop over certain goals to automate I have just kept it simple all the logic resides in the roles itself similarly if you look at my inventory I have managed and I have labeled it according to my environments so all contains all of my roles my dev has my host man and if I want to club them all I have a child host which would contain my dev and test posts that's how you manage your inventories another important thing is like infra as code we have heard about this term a lot but when we go to customers and we see that it's still not implemented properly the worsening of infrastructure as code is not done properly yet there is a huge scope where developers where all the IT guys, operation guys need to collaborate together to build it properly so in this example if you see let's assume you are an Ansible Playbook developer or infrastructure developer you would just do exactly same things as you do if you were a Java developer or if you were a Python developer you would commit or create your Ansible Playbook code in the development branch you would use the best tool set available for extensions, ID extensions or anything else and then you would just do a proper peer review approve it then that code gets run to your validation branch you can call it testing branch whatever and then it gets merged to your exact production running system now when you talk about security integrating with LDAP based systems I think that's again a necessity because every enterprise we talk about or any company be it a startup or a medium size company all of them they are using certain LDAP mechanisms in their companies or LDAP servers and with automation platform you can seamlessly integrate all of it so we have done integrations for multiple companies and it's pretty easy if it comes to automation platform then let's talk about any ideas viewing this what are we talking about it definitely doesn't have gold in it exactly exactly so this can be you need to use vaults and credentials in your controller you cannot just put on simple passwords in your Git repo that's not recommended and it's not a good practice at all so what you do is you integrate with multiple enterprise vaults which are available in the market or open source vault that is up to you and then you store all your secret data in those vaults automation controller connects to that vault to fetch the value that's what again makes it more secure and again using vaults and secret management is one of a good practice if you're using answerable automation platform so this is the server hardening if you have worked on linux I'm pretty sure you must have hardened your servers made a lot of changes in the services default services which comes with the rel server or linux server and you must have disabled the root login you need to be always updated to all these packages and you need to have SSLTLS certificates, auditing etc now a question to all of you when we were doing all of these manually or using certain shell scripts or any of the additional tools how much time it would take to set up all of these any while guesses oh long time is really long but of course yes in the demo which we are gonna have today I would just automate all of this in the workflow which we just saw and we would see how much time it took then one of the best practices which we're talking about this is configuration as code so to give you a brief example I've created three repositories my first repository is where all my ansible code is present it's like it has all my roles to disable the root login update packages etc second repository is the inventory on which host all this job templates would actually run and my third repository is my configuration as code which would download my AAP configure it as per my wishes as per the best or good security practices and then it would make all the automations which we have defined in the playbooks repo to all your hosts by this approach we are not just configuring our ansible automation platform but we are telling the ansible automation platform that perform all these tasks on all my hosts so it's all in one and for this demo I've used two organizations org storage, org unix we have defined role based access controls over here so the teams which are over here the storage admins and the storage developers storage admins have the admin access to all the objects which you see which we just talked about and the unix they have the admins have all the accesses developers have the execute permissions and the operation teams can just use it so they cannot modify or edit any of the current job template or any other project so well that's my favorite line line of towels and we have seen a lot of presentations let's talk about code how exactly it works so so this is my getrepo which is defining my automation controller what you need is just three VMs so on this laptop what I did is I just I'm using a tool called as UTM and I created three rail machines one for my controller one for my automation hub and one for my db I've just installed plain rail on these three machines there's nothing else and when I run my playbook which is called as install configure this downloads the aap from the redhat console it installs it and then it configures it I need to define just my inventory file this is my inventory so the first part which you see over here it asks for what automation controller host you have so I've defined my controller host I've defined my automation hub these can be plain IPs of your local VMs if you're running it on your local machine and then my database that's all now now if you see it already installed my automation platform this is how it looks like okay and the password is incorrect so let me just grab the password okay so that's my dashboard this is all we were talking about what exactly is automation controller that's my dashboard and these on the left hand plane if you see these are all the resources which we discussed briefly so we discussed about templates credentials projects now by default this controller has a lot of things which it comes on so I can define certain settings so by default when you install an automation controller you would see that there are there's a list of 100 modules which comes along with it which includes shell and other things also but when I was installing this I changed this setting while installation over here if you see this repository all this is like the generic settings or generic objects which I want in my automation controller so over here in my settings.yaml I have just provided these two commands to make my automation platform more secure I do not want a user to use an ad hoc command shell and then reach out to my machines and create some issues for me then I have whatever settings I want which you see in my controller over here these all can be configured via these playbooks or via these settings it's all there then if you see I just have the basic users over here admin and student which comes by default I want to create certain new ad users to it through my configuration code and if you see over here okay let's running let's delete it well deleting it at the production is not a good practice but let's see okay so all is gone and now from my terminal I will be running this particular command before running this let me show you some more things so Ansible playbook which vault to use by default whatever credentials I have used I have vaulted them using the default Ansible vault and all those vaulted values are stored in Git so everything is secure then I am passing on my inventory to which inventory and which host it should connect to I am passing my limit to dev environment which we just saw in the inventory and then the playbook name it's going to ask me password because that password I used to create the vaulted files and my environment variable so as soon as I run it now it would start creating everything from scratch and if I just put time over here let's see how much time it takes to create and then we would come back to the controller but up until now did you guys understood some of the basic concepts of HTTP? any queries or any queries until now ok cool so now if you see this gives me the logging it tells me that ok it's adding the credential type it's configuring my controller credentials it's adding my credentials and if I show you here it has already started adding my credentials so now I have all the credentials which are defined in my config code I'm not doing anything manually over here if I look at the projects these are the SCM projects which got here and if you see they're still running they would get their revision now it's adding inventory inventory sources so with this configuration code you can literally do 100% of the things you can define whatever you need on your hosts you could define your organizations which you talk org storage org unix then we saw teams and all of it is present here so in my all I have my ok it's in my development environment that's my development environment and this is how I've structured it that I would need certain teams only which are a part of my development environment I would need certain teams of production environment so that's how you could separate or segregate this so if you go and look at my dev environment I have defined certain teams over here so team unix admin devcon so I have four teams defined over here and now if you see it has created them it has not only created it it has linked these teams to my organization also that my first two teams are a part of storage organization my another two teams are a part of unix organization ok this is finished let's look at our timer so two minutes 23 seconds and now if I come here this is the job templates we were talking about that you could reuse them and it had created a workflow template for me so let me just show you ok this is running it has created this entire set of rules for me which I have defined in my workflows.yaml file it has linked all these job templates and the slide which we just saw that you could have multiple job templates linked to each other as a workflow template and then this is how it's working so disabling the root passwords updating all the packages if I deep dive into it so that's this is the first task it's configuring my web server this is configuring my services this is disabling the root login if I let's look at the output so output says ok these are dummy tasks but if you see at the output it ran on my defcon host which I have defined in my inventory and then it just displays the messages of now but let me show you some more things and this is how I have defined my workflow in my code so I have put down my first job template which has a success node, failure node you can also have approval nodes over here you could do a lot of things with the code itself so within just 2 minutes and a few seconds or let's say 3 minutes I was able to harden my host by using this particular config code repository now that there is a third part we talked about analytics in the beginning so if you see over here that's my analytics console which is like console.radar.com it gives me a brief view of what job templates run how many of them failed if I am saving some money which is really important by the way so if I come here look at my organization statistics it loads something about the organizations and then it tells you what all clusters I have so you could get all of it over here this is what Ansible Automation platform is all about you get everything in a single package and now for you if we go back to our slides talk about the analytics part as next steps I have provided all the links to this GitHub repo which are available for you guys I just want you to try it out play along with it and give us your feedback what do you think about it that's all we need this entire PDF is linked to the session so if you want to download it feel free to do that and now it's time for Q&A any question guys so so the question for the audience is how we are configuring this entire thing is there a module available in the market so there is a module which is available controller module and I can show that to you in the code also so if you come here and if you look at the playbooks over here the playbook which we ran was controller config to config everything so it uses the controller configuration module over here so that's there but developers tend to use sorry any other questions so so the question is where can we run Ansible Automation Platform on Kubernetes or directly on REL VMs you could either deploy it on your REL VMs so I told you that I used UTM as my virtualization manager tool and I just created VMs REL 8 VMs 8.6 VMs and I installed it over there I mean I didn't install it the configuration has code installed it if you want Kubernetes which is available also to which could install your AAP platform so that's also possible for the Kubernetes installation I need to check for the so in the OpenShift if you are an OpenShift admin and if you look at the operators there should be an AAP operator available over there which you could install but for the exact documentation I think I can get back to you with that exact documentation link for this yep okay so the question is like if there is a recommended practice to install AAP so AAP the default recommendations like the required ones are that it needs a REL server and so it depends completely on the organization where you would want to install AAP if you would say okay we need OpenShift use that if you say we need to install it on the VMs go for it so I think okay yeah it needs REL system and I think no more questions or okay then if there are no more questions I would request you to let me know your constructive feedback on this and give it out a try if you feel or if you need any assistance feel free to ping me I'm available on gchat or on LinkedIn feel free to do that so that's it from my side guys thank you for attending hi everyone thank you all very much for coming and I really appreciate your navigation skills it's not easy to get here this place is like in the wilderness so my name is Oel Mizan I'm a senior software engineer at Reddit working on the OpenShift virtualization networking team and today I want to present to you my little talk my cluster is running but does it actually work so just to fill the audience has any one of you ever tried to configure a Kubernetes cluster or administrate it well about half of the attendees so as you know it's not really easy to do it and when you try to add more features to it and extend it using the third party libraries or components it gets even complicated and I want to show you how using automation we could verify that our cluster works as intended so the agenda will be to display the problem to talk about what the problem is and to give a concrete example using networking capabilities and talk about the advantages of using automation in order to achieve this verification and to talk about what the solution that we wanted to bring what are the requirements we want from it what is a checkup how do you configure it and how do you execute it and I'll give a little demo maybe two if we'll have time and to talk about existing checkups that we already have and how you could write your own checkup to test whatever you need from your cluster and the conclusions so first of all the problem sometimes you have special requirements from your cluster let's say it could be in the compute networking or storage domains sometimes you want to support dedicated hardware and there are many many moving parts you know in Kubernetes and if you put more add-ons to it then it gets even more complicated and so the configuration is not always straightforward and it could be time consuming you need to dig hard into the documentation sometimes even to the code itself and there are like when you deploy the cluster initially and you want to understand does it actually do what we intended it to do and you could also do day two operations like update it or change configuration during it's a life cycle and some things could break so how do you know that your cluster actually works in this demonstration in this talk I'll be talking about Kuvert all you need to know is what about Kuvert this is a Kubernetes add-on that allows you to run virtual machines alongside containers on a Kubernetes cluster and that's it this is all you need to know by the way are you familiar with Kuvert is anyone here uses it okay so like three people we have a booth near DE 105 please come visit us it's a very interesting project so let's give a concrete example we have two worker nodes and we have a very high-speed network using SRIOV that you don't need to understand what it means but it's like a specialized hardware with a lot of components that support this operation in order for it to work and you want virtual machine one to be able to communicate over to virtual machine two through a switch everything is like dedicated hardware everything is high-speed networking how do you do it you can do it manually you can just spin up two virtual machines and do a ping between them or do some other program like IPERF to understand whether you have communication over this network or not but it could be time-consuming it could be error-prone like human error-prone and it could be also not reproducible today you put one manifest you will use another manifest so it will not be the same and it will probably not be portable between clusters so what the advantages of automation is that it is fast you don't need to think a lot you just activate the automation and get what you want it is reproducible today and in a month and in a year it will do the same things it's portable between clusters you can use it on your cluster today tomorrow on your customers cluster tomorrow and it will be the same it also hides a lot of complexity you don't need to be an expert on all of the subjects that the automation contains inside of it and of course it's less prone to human errors so what is the solution we made what was the requirements for it we don't want to use any specialized clients to create cube CTL and using just plain YAML files not using any custom stuff it should not leave any leftovers after it does its thing like if we are testing communication we are spinning up VMs and we don't want to leave them after we have completed the checkup and it should be deployable and usable by a user that is not the cluster administrator so everyone with sufficient permissions could use it and it could interact with existing objects in the cluster so what is a checkup a checkup is a Kubernetes application it's a plain application that is used to verify that a cluster functionality actually works as intended and as all Kubernetes applications it needs two things it needs a container image containing the business logic of the checkup and it needs a service account and RBAC rules to permit it to do stuff with the Kubernetes API like create stuff delete objects and so on and so forth so how do you configure or execute this checkup so the configuration is very simple you use a config map it's like a map of strings to strings you say what keys you have and what values do they have and this is the configuration you take this configuration you link it to a Kubernetes job a job is like a wrapper over a plain pod and you execute this job this job does its magic it ends and your clusters stay clean we'll see it in a minute in the demo and after this checkup finishes go to the same config map you've used in order to configure it so you can read the results save them in the site for later investigation or just remove it if you don't need it anymore so this is the first checkup that we ever done it is called the VM latency checkup it was like a proof of concept and what it does is you have a config map that you put all your configuration into it then you have a Kubernetes job that is mostly boilerplate you don't need to touch almost anything there except for the two environment variables that tell where the config map is and it spins up two virtual machines we connect to one of the virtual machines serial console we ping the other virtual machine and test whether we have connectivity and test the latency of course we could use in future checkups fancier tools but for this proof of concept ping was sufficient after we are done with the checkup we say that we have connectivity we say that we have measured the latency between the two virtual machines we delete the virtual machines everything is getting cleaned the job is finished and it writes the results to the config map so this is how the configuration is done for these specific checkups you can see that we are using keys because we cannot use CR for this checkup because we wanted that non cluster admins could use it and only cluster admins can deploy CRDs custom resource definitions so we had to use config maps so the config maps has a specific structure to them we have a timeout that says after how much time we decide to kill the checkup if it lags or if it gets stuck and we have the parameters we have spec.param.key which we use to define the checkup itself here you can see that we use the network attachment definition you don't need to know what it means only that it is an object that already lives on your cluster and the checkup can interact with and this is the checkup job example you can ignore all the boilerplate it doesn't matter all we care about are these two environment variables that tell the checkup where the config map is located on what namespace and what is the config maps name and after the checkup is completed it's run it writes the results to the same config map that specified earlier you can see here that the checkup had succeeded and there are no failure reasons all the rest are details when it started all the measurements that it did and so on and so forth what nodes did the virtual machine went to schedule to and also you can see on the failure if the checkup fails you can see that you have a failure and what is the reason for the failure on this specific test case the virtual machines could not communicate over the network so it tells us that we have a connectivity issue so the main processes of these checkups this class of application the first thing they do is fetch the user configuration from the config map the second is doing all the setup in this case it spins up to and wait for them to boot later we come to the checkups body the main part of the checkup like the heart of the checkup we connect using the serial console to one of the VMI and we input the pin command to the target VMI and we try to check if there is connectivity between them and the last step or first to last step is the tear down after we have discovered that we have connectivity or doesn't have connectivity we delete both of the VMI and wait for them to be disposed and the last step is reporting the result so let's see a demo here you can see that we try to query for the network attachment definition it's just an object living on the cluster that you don't need to understand what it does it just represent a network in the cluster next thing we do is apply the albac permissions for the checkup so it could do its magic like create virtual machines delete virtual machines and connect to their serial consoles next we configure the checkup using a plain YAML file we tell it that we want to use a specific network attachment definition we want it to do a specific checkup duration and we apply it next we define the checkup's job which I've said earlier it's all boilerplate you don't need to change anything except for where the config map is located next we apply it and we wait for it to complete you can use this command wait or you can just pull it or you can just look at it at the evening when it ends and here we get the results and you can see that the checkup is completed and all the rest of the status after the checkup was completed we can delete the checkup's job we can delete the checkup's config map and we can delete the albac the albac rules if we don't need them anymore but if you want to run the checkup again you can do it but just don't need to delete the albac rules and as you can see we try to get the virtual machines and we see that we don't have them because the checkup had spinned them up now our cluster stayed clean and we got our results we know that the cluster can run this specific workload on this specific network so existing checkups we have currently three checkups in the works the first one, the VM latency checkup is the one that was just demoed it's the most mature but it is the proof of concept for us we want to use the lesson learned from this checkup to have a more advanced checkup and the next checkup that we are currently working on is called the kubernetes dpdk checkup dpdk is some kind of network technology that uses the kernel bypass to pass a very high speed the network throughput and in order to configure it, it takes a lot of knobs you need to really understand what you are doing and in the end to say ok, does it really work or not so we use the VM latency checkup and the dpdk checkup one after the other to tell that this cluster can actually work with dpdk workloads so this is our main focus for the moment and we try to stabilize it and to make it productized the last checkup that we have is the baby checkup it's still in initial stages of development it's called the kubernetes realtime checkup, what it does is it makes sure that you can run realtime workloads on a kubernetes cluster, you can run a realtime application like some kind of machinery in a factory and it makes sure that you can actually do it this checkup is in very initial stages of development and it will be ramped up in the next few months what if you want to write a checkup of your own we have the kaagnos project that holds all of these three checkups, it provides a go library that helps you query the config maps and write to them the results and it has the VM latency checkup as a reference that you can take and use to build your own and also you don't need to use go, you can just do it in whatever language that you feel comfortable with and could query the kubernetes API so for conclusions the cluster functionality should be checked when the cluster changes whether it is from building the checkup on day 1 or changing the building cluster from day 1 or changing it on day 2 and using checkups could make this process faster, reproducible and less prone to human error so you are all welcome to the kubernetes boost next to D105 we have Andrew Burden which is our community manager there and Peter Horacek which manages the network team you should come and pay us a visit, you are all welcome thank you very much we have time for questions and another demo if you want to start with the questions if anyone has questions yes please the first question was why are we using a config map instead of a CRD as I said during the initial part of the talk we cannot use a CRD a custom resource definition because one of our requirements was for a non-cluster admin to be able to install and execute this checkup a non-admin user could not deploy a CRD thus we cannot use a CR thank you for your question any other questions ok so we have time for another demo of the DPDK checkup the more beefy checkup what it does it spinups a traffic generator pod an application that knows how to generate a lot of traffic in a short amount of time and it sends this over the network to another VMI that takes this data and transfers it back it's like the VM latency checkup but on steroids so the configuration is pretty much the same what the logic is pretty much the same but the actual business logic is much more complicated inside but for the user it's all the same so we want to see that we have our network attachment definition what the thing that gives us the configuration to use the network we apply this safe on Figma as you can see it is precisely identical to the VM latency one but with another image name and that's it, it's pretty much the same we apply we wait for it to end of course this is not in real time it takes several minutes for it to end we get the results we see that we have zero packet loss which means that we can use this cluster for our DPDK workload we delete the job we will delete the config map we will delete the abac permissions and keep our cluster nice and clean any other questions yes please ok the question was why do the config map keys are similar to a regular object in Kubernetes so the reason is to make it similar for people that are using custom resources and using this you know this is like a walk around we cannot use custom resource definitions we are doing the next best thing but on config maps thank you are there other questions ok thank you very much thank you all for coming in this late hour