 The speaker is Andrei Kish, now Chris Kish. Kish. Andrei Kresh, Kish. It's Kish. I'm him. Yes, it's me. So hello, everybody. I'm Andrei Kish. And today I'm going to talk to you about some moldable analysis with Moose. So first, so currently I'm working in a very, very small consultancy company called Think, so you probably never heard of it. And our job is to help other software companies better understand their software systems and better make decisions about how to further evolve their software systems. And for that, we are using Moose. And Moose, in a very few words, is a platform for software and data analysis. It's completely open source under MIT. And its goal is to help developers, so you guys, to write custom analysis easily. And today I just want to give you a gentle introduction to what does mean, just so you have an idea of what we are doing and how we are using this platform. So in short, the focus of Moose, in order to be able to make it easier to create those analysis, the focus in Moose is not on providing you with a huge list of predefined analysis that you can apply out of the box on your software systems. Rather than that, the focus is shifted towards what we call engines. So these are tools for building other tools. So that's Moose, in a word. That's a collection of engines that allow you to build different types of tools that we're going to use during an analysis. And for example, there are engines for building various types of parsers in order to import data into Moose. So John before was talking about SMAC-C. That's one engine that you can use to build parsers in Moose. And here we're not just talking about parsers for source code, because source code is just one aspect of a software system. You have log files. You have configurations over configurations. You have deployment scripts, palier compilation scripts, any other scripts. You might have usage data. You might have real data. All of this should be able to import all of those into Moose and reason about them. Then once the data is imported, we need to store it somehow. So for that, we have various engines to build different models to represent data, and then link it, intertwine it, and reason about it. So here the simplest is for building ASTs. So that's a simple one. And then once the data is imported, we have engines to write various types of analysis. For example, you can think here of static analysis tools or checking dependencies or looking for the application. But there are also others. And now in the rest of this talk, instead of walking through a list of just features of Moose, I want to show you two use cases, where we can use Moose to guide certain tasks that we have to do. And first one is going to be related to investigating deprecations. So for this, we're going to use Java. And the system that we're going to look at is Lucene Solar. So that's quite a big Java system. And luckily, I found a lot of deprecated classes in there. So if you're not familiar with deprecated classes, but I think you are, it's this little thing that you add to a Java class in order to remind yourself, I should remove this at a certain point in the future. But then a lot of times, you don't remove and you keep using it. And then the usages of this are everywhere. But at a certain point, you will still like to remove them. And what we are going to do in the next minute is assess how hard will be to remove the deprecated classes from Lucene Solar. Does that. And then see if we can do that in five minutes. So first, this will be the main interface that you get if you go and load Lucene Solar into Moose. You don't have to read this. It's just, for example, it will say very generic things, like all methods, there are 75,000 methods, there are 12,000 classes, and so on. So you'll just get a brief overview of the system. And then now we'll say, OK, I'm really interested in the classes, this 11,000, 12,000. So then you can click and then you'll get a new pane to the right to that information. So here we have a list of 12,000 classes. And now to find the deprecated ones, I definitely don't want to scroll, because that will take some time. I could, but I will not do that. Instead, we have here another pane, which is a pane where we can write the query against the data that we have loaded here. So here we have a list of 12,000 classes. And now I want to write the query against those to only get the classes that are deprecated. So let's do that. So this is the query. Can you understand it? Good. Who can't understand it? OK, so this is your first. So this language is, the syntax is in pharo. So pharo is the programming language on which Moose is built. So it's an open source, object-oriented programming language. So we'll just say self. So this is the current object, so the current list of classes. We want to select only those classes that are annotated with deprecated. That's it. And if we run this, we can execute it. And then in a new pane to the right, we will get the list of deprecated classes. And there are 36 deprecated classes here. Lovely. Now can we remove them? No, because they might be used. So one option would be to go to every one of these, click on this, and say, give me all the classes that call this. Is it zero? The list. Remove it. So we'll have to do 36 times, give me the list of references to this class. But we don't want to do that. What we want is to write a query. And we can update our previous query to add another condition. So we want also and that are deprecated. And I forgot here some parentheses, no matter. Each client type is empty. So for each entity, in this case, each is a class. I can ask it, what are your client types? What are the classes that are using you? And it has also provider types. What are the classes that you are using? So in this case, I want all the client types in the list. The result should be empty. Easy. So if I run this, I would actually get one result. So there is one single class here that happens to have no client. So that we can remove. Done. So now if we change the query, so here says is empty. So now let's make it not empty. Is not empty, we get the rest of 35 classes. And now that we have this, what would you do? It's OK. We'll find the info in the deprecation notation. And replace class with a new version. Yes, but we still need to find where it is used. How is that using it? We probably want to work first on those classes which are deprecated and which have only one usage. OK, so what we want to do is look at this list and see how many usages does every deprecated class have. And maybe we have a deprecated class that calls another deprecated class. And then we can just remove them together. Maybe we have a deprecated class that's used in 100 places, or just in one place. So we need to be able to understand this structure of these deprecated classes and how they are used. And we want to do that in the next three minutes, because I'm running out of time. So the idea would be then to build a visualization, to build something that visualizes this. So let's do that. So we can start from the previous script. And now I'm just naming these classes. So now this result, the previous list is in the variable classes. And we want to build a view. We want to show shapes that are circles. And if the class is annotated with deprecated, I will color it in red. And then we want to display the current list of classes, the current list of deprecated classes, together with all their client types. So we're getting, give me all the deprecated classes, give me all their types, put them in one list, add edges between all client types, and then color them in red. And then use a force layout in order to make some sense of this data. So I guess we can write this in five minutes if we go slower, slowly. And then we can do this and we get this. So now we have to cast a few minutes and then we have a different way to look approach this problem. So now we can all of a sudden see some interesting things. This one, easy. This one, easy. This one, who knows, who knows, who knows. So we got an idea now of the problem that we are facing. We're still not sure which one would you start first. I don't know. Is this one easier than this one? I don't know. So we need to go a bit more and investigate. So we'll adapt the script with a few tweaks. I will show you the script at the end. But first thing that I would like to do is in this Lucine Solar project, as it's named Lucine and Solar, there are actually two projects there, Lucine and Solar. So first let's highlight these classes with blue if they are from Lucine and with green if they are from Solar, exactly. So then we can see more interesting things. So most deprecations seem to happen actually in Lucine. This cluster here is mostly Lucine. This one here also. This one here and these two will be the more complicated ones because these are all mixing, this one even worse. So maybe we can start with this one at the beginning as it's just one system. If we don't want to start doing deprecations. We can start with a smaller one. Depends. You will see in a second. We could start also with the smaller ones. But then we can make a plan. We have a bit more information. But still here, how big are these classes? Maybe this here is a monster and maybe these are all small. So let's add one more step and let's make the size of a circle be equal to the number of methods in that class. And then we get to a different story. So now turns out that one of those deprecated classes is actually huge. This one I don't want to look here. This seems scary. This seems more manageable. But still not maybe even this one is huge and this one is huge. Maybe now I just put number of all methods there. So the size of a circle is given by the number of methods. That's it. But maybe half of those methods are not used from deprecated classes. So we're just interested in those methods from deprecated classes that are used from somewhere and all the other methods that are using them. The rest we can ignore. So we will tweak our script with a few more lines and then we get this. So now here we have all the methods and from other classes we only show the methods that are using a deprecated class from another class. So now we have a different story. So now we can start actually to get more info and see what's more going on here. So if you look here, seems that all of these classes are just, I have one method that's using another deprecated method. So those are should be easier to do. Okay, this was the big one, but it's just three methods using another three methods. We have no problem there. This is problematic indeed. This is seems problematic. This also seemed that just two or three methods are using the deprecated ones. Quite easy. But we can do one more thing. Let's change the size of an actual circle of a deprecated method to be equal to the number of other methods that are calling it. So the bigger a method is, the bigger the circle is, the more cold is from another places. So now we have, again, a different story. And now we can actually start making decisions and we have all the information that we need here. So it turns out that there is one, two methods that are really problematic. So these two methods, if we deprecate, if we remove, we can remove. They are used everywhere, this and this. And then here, it's still a few, but it's more manageable. Here it turns out just one. So now in just about five minutes or six minutes, we are able to move through a big, this was a huge project and reason a bit about what can we do about this problem of removing deprecations. And then if we go a step back, then this is the code that we need to write in order to do this. It still takes some time to write, but all took me around one hour to do this whole scenario of thinking, what do I need to do, does this help me, doesn't this help me. And notice here, there is no scroll bar. So it's not that much of a code. And that's what I mean by custom analysis with Moose and by allowing developers to write custom analysis. This is just code. It's just a code that uses some provided libraries in order to make sense of some data that we have. But then this is not just a static picture. So I run the script and I got a picture. For example, we can also select highlight. So if I go over a node, so if I go over this method here, I can see the list. I will get highlighted all the links with all the other methods that it calls. But now it might be the time when I want to read the code of this method. So until now, we reasoned about this problem and we never looked at source code. But we could make a lot of progress without having to read any code. But now will be the time when we should read the code of this method maybe. So then we can select it. And whenever we select a method here, we have the model entity that represents that method. So we can open a new pane to the right where we can show the source code. So now we can look and see does this make sense? What does this method do? Now we can start to do the actual task of removing it. And if we take here a step back, so using this scroll bar at the bottom, we can change or expand how we look at our session, how we look at our exploration. So we started initially with some overview of the system. Then we selected the classes and we wrote a query on those classes. Then we build a visualization for the result of that query. And then we use the visualization finally to explore source code. So it's in the same workflow. We have three different very conceptually different things. So writing a query, building visualization and exploring source code. And from here we can continue. We could say now let's look at the centers of this or let's continue. So we can always continue the navigation. You don't have to stop at a certain point. And then you always can go back and see well, how did I get at this point? What was my query? What was my exploration session until this point? Questions for this one? Yes. Are there any tools to compare it with the sub-sourcerer formation in the part-j's run-descript transport-to-delete and fit to a what-to-run-the-test solution? So I'm not sure I got a question, but we can come back afterwards to it. Because otherwise we are losing some time. Yes? Do you have a universal language model for all supported languages? All support languages. So this model that I'm showing you, so he asked if there is a universal model to model all languages currently in Faro. So we have a model, it's called Femmings, but mostly for object-oriented languages. But that's building an engine. So with that engine we can build a model for whatever language you prefer. But right now out of the box we have for object-oriented languages. Not any other language. Okay, so now let's move to a second use case. So this was Java, this is maybe boring. So let's look at another case study that we had to do. So the previous one was more or less an example that I took and just to show you some things that we can do with Moose. So this one is actually from a real client that we had to work with to help them with the Angular application. And their problem was that they had a big Angular 1 application. It was around 300,000 lines of Angular 1. So it was quite a big monster. And they were trying to decouple it. It was a big monolith. And they were trying to decouple it in a few components so that they could develop those independently. Because until then everything was in one place and everything was connected with everything. So it was not as easy to develop and to maintain as they would have wanted. So we said okay. But however we had nothing in Faro for modeling Angular or anything like that. So we said okay. So first I will give you a quick, very quick introduction to Angular and how dependencies are specified in Angular. Because that was the problem there. We needed to understand all the dependencies from those Angular things and then it make them explicit. So in Angular there is JavaScript and then there is HTML. And then in JavaScript more or less you have modules that define components. And those components can be injected in other components and then you have templates that define the actual HTML. And then templates can refer to components. More is like react but in slightly different terminology. And then if you look at a very simple code example. I hope you can read it. I try to make the font as big as possible. So we have a module and in this module we are using a template. So we will define template URL and give the name of the template. This case module X. And then in module X we have a component. So we're using sum dash component dash B. So this means there will be another module that will define the component. So it's sum component B. And this module is using another template. This is module Y component B. And then this defines, so this is using another template and this template defines uses another component. So this uses sum component A which happens to be defined in the module above. So now we have actually a cyclic dependencies between these. So now if you are trying to remove this you will get errors because it's cyclic. So ideally we don't want to have these kind of things. But to detect this automatically so we have to find all these dependencies. And then, so notice here sum component A is specified in camel case. And then it's used here in a different way. So with under dash. So you can just grab. That would be ideal. So I don't know who designed Angular or why did they do it like that. But you cannot just use search to find dependencies between components. That's never a good idea ever. But here I found the same problem in PL SQL. So that's a very old language. But seems that things are repeating themselves. So we had this. And what we said is okay. So let's build in Moose infrastructure that we can import this, analyze it. So we import the JavaScript, we import the template and then get all the components and all the dependencies between these components. So we did that took a few days to get everything working correctly. And then we could write this query. So we could write, for example, view giving all the angular templates and give me all the angular components because we had them in our model imported. And then we can build a few more things in this view. So you could just have, okay, show me some lines. And then I want to see edges from all the angular templates to, for the template for all the angular templates from the templates to the templates that it includes. And then from all the angular templates to the template to the components. So just draw lines from the template to other templates and other components. And then apply a fourth layout, something. And then we got this. So this was the thing with all the modules and all the components and their connections together. Now this view, before with the previous view, we used that view to make a decision with the one with the actual, with the deprecation. So on that, it's okay. This is the cloud, it has many deprecations. So this one is not really useful for making a decision immediately. But it gives us an overview of the mess and the complexity that we had to deal with. And the task that we had to work with them to do, they had two modules, module A and module B, and they wanted those to be split. So then we said, okay, how widespread are those modules? So we took the script with a few lines. So now red is module one and blue is module B. So this is the scope of the issue. Still, this view is not useful to make a decision. It just shows that's okay, this is the problem. Deal with the problem. What we did then is say, okay, ideally we could take all the red things, move them in one place, and we can take all the blue things, move them in one place, and it will be fine. Except that some components, and I'm not sure if I can find one. So maybe some components are using both A and B. So if I have a component that's using both module A and module B, then I have to do something about it. So what we did, okay, let's say, let's look at one component individually. So for example, we could say, give me all angular components, and I want one that, all the ones that have, so this is a made up name. So just give me all the angular components that are, for example, in a certain folder. And I want to have those. And then if we have that component, we can build, show a different view that shows us the component. So now we can see again the components with all their dependencies. And for every component we can highlight in red if it's used by the module A, and we would highlight it in blue if it's used by module B, and we would highlight it with both if it's used in both. So it means that these components that are used in both are the problematic ones. So now we have actually a tool that allows us to look at a set of components and see are these components okay, or should these components be touched. Then we can write a query to detect what are all those components. And then this is how we can make decisions and make sense of this nice mess. And then also from here, for example, I would click on one of these and get a source code that is behind. So it's blurry because it's source from the company. But then this tool we actually used for a few days and then we were almost had clear what the plan was clear what should be done. And before they were spending a few months trying to do this refactoring without success because dependencies for them were implicit. They were not exposed. They didn't see them before ever. So you're just fixing one, running and hoping it will work. Okay, so these are the main two things I wanted to show you. If you want to find out more about how to use Moose. So just showed you now Moose and how it works but how do you integrate Moose into your, for example, Scrum if you're using Scrum or to your development process. So for that, there is also, we did the methodologies called human assessment. It's also freely available and open source. And this just, in this we started the hypothesis and the role is have a hypothesis. Can we refactor this component into two or can we split this into two? Then we apply an analysis. Then we interpret, are we confident? We do something, not, we do again. But the problem here is what happens if there's no analysis? So if I have a question and I need to answer it and I need to build and there is no analysis, what do I do? The idea is you build the analysis and this is how we are using Moose. So for us, Moose is in a way in this process to support us when we have hypotheses for which we want to apply analysis but for which we don't have the analysis. So we're using Moose as a tool to build those analysis to help us make decisions and steer systems in one direction or another. So if you want to find out more, please go to MooseTechnology.org. You can join the community, it's quite, so I guess it's open source and we'll be happy to have you on board. So thank you very much. Yes, please. I have two questions related once. The first is what kind of library or algorithm do you use to draw your graphs? The second related one is when is it worth to draw a graph and when it isn't because it's so messy that no human mind can understand it. Exactly, so first question was what libraries are using to draw graphs? So we have our own implementation of a framework for drawing different types of graphs. So that's custom-made. And second, of course, if you have the graph, so the second question one was when is it worth it to draw your data as a graph? So that's the exploration part of Moose. It's never by default you should draw just your data as a graph because if you draw all your data, then for sure it will not make sense. So the approach that we're doing in Moose is this iterative way. Where we start with some data, we try to visualize it and see, can we make a decision based on this data? Is this data in a way that we can show somebody, for example, somebody non-technical and help him make a decision? And if not, we need to reiterate. What do we need to change? Do we need more data? Do we need to change our way or visualize it? So it's always a iterative process of finding the right way, the right visualization to show based on the data that we have. Yes, please. I noticed in your two examples the focus was on dependency relationship and code. Is that a common theme for you in most of your projects as analyzing dependencies in code? Yeah, that appears quite a lot. It's not the only thing that we do in Moose, but I wanted to just show one thing here. So this is, yeah, but indeed, this is the main thing that we're using in Moose. Analyzing dependencies or analyzing the structures, seeing how we can split systems or how to control dependencies as systems evolve. That's a big part, yes. Yes? Is it possible to use your existing engines to be used automatically in IDEs and quick fixes in this, like in the JND? Okay, so the question is if it's possible to use our tools to make quick fixes or other automated analysis in IDEs. So we don't have anything that works with that by default, but it's possible. So what we have that works is, for example, you can write some queries and have them as unit tests. For example, I could say I don't want module A to depend on module B. What I want just to be one, this is the only dependency allowed between module A and module B. You can write that and you can have it as a unit test that you can run in Travis or in your continuous integration, for example. So that's some part of it. Not really four other IDEs currently. I will use your tools today. Good. So why do tests? Okay, time's up. So thank you very much.