 Thank you for being here. I like the number of people. I am going to present about an open-source software tool for comprehending huge code bases. Codevis is a software developed by CodeSync. And it's part of a project sponsored by Bloomberg. This talk has two main parts. In the first part, I'm going to give an overview of the two itself, how it works, and there is a little bit of echo. I must be careful how it works. And then in the second part, I am going to show you how did I use the tool in some KDE software as an experiment. In the first slide, I wanted to make sure everyone takes an idea on how more or less the tool is. So this is the graphical user interface part of the tool. Basically, you can load your code, your C++ code, and generate a relational database representation of your code. And then you can open this database in the interface. And you can drag entities, move them around. You have plugins that you can paint them in a different way. You can bring the external dependencies and inspect them as well. You can create tabs. You can create split views and everything. I mean, everything related to the interface is fairly the usual things that you may expect from usual CAD-like software. Zoom in, zoom out, filter. Filter is something that's really useful because when you have a large database, I mean, a large codebase of C++ code, sometimes you have too many information on the screen and you need to find something so you can filter out other things and visualize only the things you need to. You want to see. So yeah, I just wanted to make sure this was covered. So that's why it's the first slide. So the tool is based in the ideas from John Lekos, which is written in his books. I have some copies over here if anyone wants to take a picture of how it goes. But I need to at least give you the basics. I won't cover, like this thing is huge, right? I won't cover everything, but I need to cover the basics. So the basic idea is you have two kind of aspects. You have the physical level design and the logical design. And the physical part of it, basically, the idea is to have components. Components are basically pairs of C++ code and a header file. So if you have a Cpp file plus a header file, you put those things together and you say you have a component. You may have a test driver for the component, which usually is named as the component name.t.cpp, t stands for test. The logical part is basically the classes, structures, and everything that's real code. And the physical parts are basically the files, folders, and the structure outside the logical domain. So yeah, again, logical level are those round-shaped things. So those are classes, structures, or something like that. So classes goes inside a component, which, like I said, is a Cpp plus a header file in this model. Components may depend on other components, which then you have an arrow. Basically, if you have an include file, an include between a header file to another file, then you have a dependency between components. Then we draw an arrow between them. This is the component level dependency. And then if you put many components inside a folder, what you have is a package. And if you have many folders inside another folder, you have a package group. So yeah, this is the basic idea of the organization suggested by Lakers in his book. Of course, there are a lot more to cover. This is just so that everyone gets an idea of what I'm talking about here. So also, you have this kind of level design thing that you can see that everything that depends on other things, you can put above the things that are less dependent and so on. John talks a lot about the levelization techniques in his books. So yeah, for the tool itself, what we do is we use LLVM to parse the code, like read your code base. And we have basically two steps. Like the first step is a physical scanner where I find all the source codes and the header, the include headers from one file to other. So for those, we grab those information and put in the database. And then in the second part, we have the logical scanner which finds all the classes, structures, and everything related to the logical level and the relationship between those and the physical level design. Everything is moved to the database. It's a relational database. And then finally, you can visualize the databases in the tool, as I said before. There might be some problems, though. Not everyone do code in this particular, I mean, for instance, in KDE, it's not necessarily true that you have one single header file for a single CPP file. So that breaks the rule. In order to solve that, perhaps solve is not the right word here. But in order to being able to use the tool in that different kind of situation where you don't necessarily have a CPP and a header file composing a component, we ended up bringing this idea of semantic rules, which is optional. So basically what it does is, if you have a different structure, if you have a different way of composing your components, then you can say, hey, I don't do it like that. I do it a little bit different. I will tell you how I do it. And then when you parse the code, when you do your physical scan, please make sure you take in consideration my rules so that I can still use the tool to bring some semantic to my code, to my database. So these will be used in the second part to parse some KDE kind of code. OK, some features overview. In the GraphQL user interface, if you have a relationship between two packages, you don't necessarily have to inspect each package to see what are the dependencies between those two packages. So imagine that you have another folder. Some components from the first folder depends on some components on the second folder. And let's say you want to remove this dependency. Which files do you think you need to modify in order to remove this dependency? Well, the tool gives you an option to inspect real quickly which files from package A depends on which files in package B. So you have a list of components that you may have to change in order to break the dependency between the packages. So this is dependency inspector thing. Dependency types. Sometimes a component depends on other components, but only for testing purposes. Like, it's not a dependency directly in your code. It's a dependency more in your test driver so you can recognize those. You can tell the software, like, I have a set of rules that I'm allowed to depend on this particular component, but not necessarily I do depend on it. So if you want to give those rules, the tool can help you visualize if you are allowed to depend on a subset of components. If you like, this is optional, by the way. It's just something that we have. OK, code generation. I need to be careful here. Code generation here has nothing to do with artificial intelligence or anything like that. It's just that we have some templates where you can loop over all the entities from your representation in the user interface and generate files for it. Because in the tool, you can not only inspect and load your source code. You can generate new entities by creating new packages, new components, putting relationships between them interactively, directly from the user interface. So this is possible. You can generate, first, your design and generate code from template files based on this design. This works by basically putting a Python file. So it's a Python script that loops over all entities. And for each entity, you have a Python function where you can say, hey, I want you to do that, which is basically create a file, put those contents using this template, and you will put everything together. Plugins. We've been working on several plugins for the system because we want the community to be able to produce more and more plugins and more and more functionalities. And you can have the possibility to inspect and visualize your code in different ways. So today, we have basically two types of plugins. The first one is a C++ plugin, and the second one is a Python plugin. The difference is basically the language, but they do basically the same thing. I just noticed that when I do that, again, everyone hear me well, like, OK, that's fine. So yeah, we have basically some hooks on the code where in some specific parts, you can implement this hook and say, hey, I want to plug something here and do something different, the basic plugin system thing. The Python plugin infrastructure is a work in progress just for you to be aware. So for instance, I've developed a code coverage plugin. So what you can do is you have a representation of your code base. I'm really sorry about the quality of the image, by the way. I'd have to resize it just to make sure everyone could see in a little bit ugly. Anyway, let's say you have your code base loaded in the tool. It's just a bunch of components, representation of entities, which is the red and orange things in the right side. You can paint those based on code coverage information. So red means this component is not tested or very little tests attached to it. Orange would mean you have some tests you could do better. And you could also show the code cover report, which brings up this little panel in the side where you can inspect which parts of the actual code is being covered by test or not. This is, as a plugin, this is basically done using the GCOV software behind it. We don't do anything related to the coverage itself. We just bring the functionalities from external software as a plugin to our software. CI tools or command line interface tools, to be honest. You have the possibility of generating only the database file, like only the relational database, and not the project file for visualization, if you like. We've been using the command line interface tools, basically, as a way to output our own system from the CI. So we run the CI. And as an act effect from the output, you have those two, leikos.diagram.tb and .elk. I mean, you can name whatever you like. But you have a DB file and the LKS file. The DB file is the raw relational database. And the LKS file is basically the database plus some metadata so that you can visualize in the graphic user interface part of thing. So yeah, you have this set of tools that you can use either for CI or just try to use in your command line, if you like. You can do either in the command line or you can do in the graphic user interface the code generation thing. So yeah, the second part is basically applying all those things in the KDE software, not everything, but I just picked some software and tried to use the tool to see how it works and if I had any issues and tried to fix the issues as I go. So the KDE frameworks. I'm using KF5 here because that's what I have. It's a set of 83 add-on libraries from programming to KUDES. I'm just reading right now. Divided into categories and tier tiers. Tier 1 have no dependency within frameworks. Tier 2 can only depend on Tier 1. And Tier 3 can depend on any tier, including itself. So yeah, if you use the tool and bring the KEO resource network access abstraction, I don't work with it. I don't know anything about it, so I just try to use the tool to inspect it. You get this kind of image where you can see basically the relationship between the package, the KEO package, and the other ones. But it's not necessarily easy to see the tiers here. So what I did is I created a Python plugin to paint each project in a different way so that I can inspect that, in fact, KEO is Tier 3. And I can also take a quick glance that the other packages are Tier 2 and 1. There are some. I mean, I'm only showing here KEO and its dependencies, not necessarily the dependencies of the dependencies. So it's one level thing. That's why you have a bunch of arrows coming out of KEO, but no arrows coming out of, let's say, the bellow ones. It's unreadable, I know. But the point is the most, the uppermost thing is KEO. And the ones below it are the dependencies, basically. So in order to build a plugin, at least the relevant part is real straightforward. You just have to iterate over all the entities being shown in interface. And you can paint each package in a different color, depending on which tier you think it is. So this is the kind of thing you can do. I mean, it's a very, very basic plugin that you can do. Kate, I chose Kate because it's one of my favorite softwares from KDE. And what I did, more specifically, is I mean, this is Kate, you have basically roughly two layers, like the actual application and many, many, many, many plugins below. I mean, it's unreadable, I know. But if I want to say inspect the plugins, I can pick one, open it, and filter out everything else. And then I don't know anything about these plugins. I didn't open the source code. I don't know anything else about them. But I can start to understand how it works only by looking at its architecture inside. So I know that in the right, oh, I'm sorry it's cut over there, but I hope it's easy to see. But anyway, you can kind of see that there is an hierarchical structure going on. And you can also see that there are several cyclic dependencies going on as well, which sometimes is something that you may want to get rid of, depending on the case. You have very complicated plugins. I mean, some plugins have many cycles. It's hard to see which component has what level, what's the set of dependencies a single component have. It's kind of hard to see. So for this particular plugin, and I'm not interested in visualizing exactly which plugin is. But let's say I choose this plugin and I'm trying to understand it. If I read the source code, I will probably have a bunch of include files going on between them. But I don't have a general idea of the structure of the plugin. By looking with the tool, I know that, oh, wow. I have a component. Can you see my mouse? No. Yes, I have a component over here that looks like it has many, many dependencies. But I have other components down here that don't have any dependency at all. So the ones that don't have any dependencies, perhaps, are some utilities or something like that. I can begin to start to understand the codes by looking at its structure. That's my point. Still about the plugins, one thing that I've noticed is that even the plugins that have a small structure, they struggle to get rid of the cyclic dependency between the plugin itself and the main widgets of the plugin. So it seems like there's some sort of, I don't know, attached dependency going on. And everyone thinks that the plugin and the main widget of the plugin must be somehow depending on each other. I don't know why, but it's just something that I noticed. Maybe it's an useful information. So yeah, let's pick one and try to, let's say, fix. So this is a very, very simple color picker plugin that had a cyclic dependency for some reason. But I found out it was basically an easy include going on. So the tool will, like I said, go through all the includes and say, hey, you have a dependency between this component to the other component because I found an include between this component and the other component. But if the header is not used, then I can remove it. And I don't have a cycle anymore because there was no real cycle going on anyway. So yeah, you can also use other tools to verify and find in these includes. But apparently, no one did. So this will pick this one in specific. So yeah, this is the kind of thing that you may want to try. Console, which we picked because basically Thomas is working with me and he's the maintainer of console. So yeah, Console has, again, mixed architecture. You have some cycles going on between the packages. I tried to pick one. And let's try again to resolve the cycle dependencies as an exercise. So this is a little bit more interesting. In order to remove the cycles, I needed to change the code a little bit. And in the first one, let me go back again. I have two cycles. Problem is, I want to remove those cycles in order to have a more natural structure, natural architecture without cycles going on. So in order to do that, I needed to move out some code to avoid having this extra dependency. So move code to avoid cycle dependency was the first thing that I did in one component and in the other component. Instead of directly depending on the other component, what I did is was to create a signal and then reach the signal and make sure you don't have the explicit call for the other component. So after I did that, I have this layer kind of thing going on in this particular package within console, which is the profile package, by the way. Kcash greens. I chose this one because I just wanted another one, no reason really. So yeah, it's huge again. And for this one, I want to show, let's say, again, I'm interested in removing the cycles. But I mean, I can even see the cycles, right? There's just a bunch of things going on. So you can use the cycle detection plugin in order to help you finding the cycles within the code. So what it does is to give you a list of cycles and you can click in a single item and it will paint the cycle. And then you can filter out everything else and see only that particular cycle that you're trying to solve. So let's get rid of the Inusen Include one, which, just to let you know, those two images is for the same thing. The first example, I just removed the dummy in Inusen Include thing. And I want to proceed like to inspect why the other cycle is going on. So basically, from a software that I had no knowledge about, I'm able to inspect, find cycles, remove the dummy in Inusen Include case, and then proceed to start improving the software, like removing the other cycles by really inspecting and going down to the actual codes. And again, I had no idea about the code before I started, which is a good thing. So for this particular case, what I did, I opened the component and I started understanding why you had this cycle between two components. And then I came down to the logical level for it. So in the logical level, you have many classes in this particular component, which may be. So basically, when you open the file, a CPP file, you have, in this particular case, many, many classes. It was, I don't know, thousands of lines of code with many, many, many classes. I mean, it's a way to solve things. I wasn't there to, but the usual thing, like it's more common to see code where you have more or less one, true, or perhaps three classes per component. So one suggestion would be to split the component in several other components only to avoid having this fake cyclic dependency. Because what is going on is you have so many things, so many classes within a single component that when you need two classes, you will bring not only those two classes, but all other classes with you. And then you end up having a cycle. But in the logical level, it's not a real cycle. Anyway, it's just that this is the kind of thing that, I mean, for a newcomer, it's hard to see, I guess. For someone that never reads the KCache green code, it would be impossible to know that those things are going on. And finally, for future work, we have some ideas. Basically, add more hooks for the plugin system. One idea that we have is the Knowledge Island plugin, which is basically, let's say you have a team of people and you have a high. Let's say people are coming in and out and you have no real control over it. But in the code, you know who works and what. So let's say in the image in the left, you have a single component which I painted red when only one developer had altered it. You have only one developer that has changed it, created, and did everything in the code. So it may be a good idea to put more effort and bringing more people, to bring more knowledge about that particular component. So green is when you have more people involved in that component. Orange is when you have two people or so. And red is when you have only one developer and you may want to put more effort, bringing more people, changing that code, maybe writing some tests, maybe doing some, I don't know, bug fixes. Code complexity and recommendations plugin is something like, if you have, let's say, a single component with many, many lines of code, you can paint those red or you can bring a box and filter out those things using the plugin. Those are all just ideas, by the way. Those are not implemented yet. But basically, the code complexity plugin idea is to have something that tells you, hey, this is a subset of your code that does too much. Please try to, I don't know, break it up in several components or extract to another package or do something about it. It may be a point where you want to improve your codes. And community ideas, that's why we are here, one of the reasons. If you have some idea that you want to use it to, and I don't know, we want to inspect something specific or you want to do anything with our plugins, please just let us know and we will do our best to help you, help us. Yeah. And the other set of ideas is regarding exploiting the relational database. One thing that I mentioned before is we read your code using LLVM. We create a relational database representation of your code. And then on top of that, we visualize your code. Visualization is just one aspect, right? You can do many other things using the relational database. For instance, I did a bunch of examples where I found, in these it includes, those can be automated, right? I mean, I can look at the relational database and I can create, I don't know, a Python script or other software that just reads and change the code and remove the inus and include for me. And then, boom, I don't need to do it by hand. Include what you use, kind of, too. So you have, let's say, component A, depending on component B, component B, depending on component C. And in component C, you can be using entities for component A, but not including them. I mean, there are some people that like to use it. That's fine. But if you don't like to use like that, you can create a tool to require, to have an explicit include from component C to component A to solve this kind of issue. Automatic binding generation. So, like I said, you have a relational database representation of your code base. That means you have a table in a database filled with all your classes. You can iterate over all those classes and generate, I don't know, Python bindings, for instance, or some other thing that you like to go through all the classes, inspect for each class, which method they have and create some sort of anything you like with those information. Bindings is just one example. And yeah, before thank you, I just need to take some considerations. Like I said, this project, the software is written by ColdSync and it's part of a project, bigger project sponsored by Bloomberg. Bloomberg has lots and lots of great people, really, really smart people doing software development. Not many people know that they build software, to be honest. One thing that we found a source. Yeah, they have so many really smart people working with modern C++, working with real complex software and they are actually hiring and heralds over there. If you like to talk to him, like to inspect a little bit about that, just like I said, talk to him and he will give you some more details. For instance, they have John Lakers, they have, which wrote the book. By the way, I am giving you two books for the first two questions, perhaps. Like I will open to questions and whoever do the questions, we'll grab a book. And finally, we open sourced the tool a few weeks ago. I don't remember when it was. Anyway, it is on GitLab. Please take a look, hack around, break it and report. We are actively taking suggestions. Oh yeah, that's it. Okay. I'm the session chair. I get to ask the first question. At the beginning you talked about translation units as the fundamental part you're working with. Do you integrate with the build system or meta build system to take things like CMake targets into account? This is a good question. We are working on it. For now, what we do is we provide that set of command line tools in order to enable you to try to do it yourself. But Thomas is doing some work in this direction and I think we will have some more things going on and that way it's soon. I got several questions, so are there any other hands? So on paper, when you describe the pipeline, it feels like it's not that far fetched do you think that might be made C++ agnostic? So I'm wondering how much of the implementation currently is making strong C++ assumptions all the way to the model? We have some assumptions. We've been trying to get rid of them and I think the plugin system may be a way for that. The answer is we have some assumptions and we are working on it and if you have any specific suggestion when you try to use the code and you break something and you think it should be better, just let us know and we will improve it. Basically, this is the way. For the cases that we've been trying, it is all fine but I know that there may be cases where you need some changes in the code. That's true, yeah. Okay, can I ask you something? So for the visualization part, you've shown only graphs, right? Dependency graphs. The visualization part? Yes, you've shown only graphs, right? Notes, edges. Yeah. I'm wondering, did you try to produce dependency structure matrices as well or? Not yet. This is something that you are not the first one asking for it and we don't have it. But I mean, there is a lot of ideas, right? This is one of them. It's been asked before. We didn't do it yet, unfortunately, but this is something that we know that we need to do eventually, yes. Okay, Kevin already took the question about other languages like QML, for instance. But another question I'd have is what are you using for visualization? Why do we? Like which library? Basically, Qt, Qt. Okay, I see you for it. Okay. Let me answer again. We use Qt and it's awesome. It's great. We have no issues over that. I mean, it just works. It's amazing. So, right now it's quite an expert tool. Have you thought about focusing on actionable advice so that people who don't already know what's in the book can use it and start working right away? No, but I think if I remember correctly, I think we did some discussion in this direction, but we did not evolve it. Yeah, this is a good idea, I think. It's just not done. If you think this is something that interested you, open an issue and we will probably try to put it in our tasks and make it. I mean, like I said, there are, sorry, there are many suggestions and many ideas. We just need to prioritize them and make sure we pick the ones that community really likes to have and begin with those, perhaps, yeah. Thank you, let's thank you.