 In the general software engineering, there are different ways of approaching the problem of code quality or long term maintainability. First thing is to make sure that we have the best programmers that are very careful and very diligent and they like to write a book about their code and it's very, very transparent. So you understand everything that is written there. The second approach that people use is to do some kind of automatic checking. So people check different statistics. And the third approach is to write the code every once in a while, trace modules, make it more modern and on the occasion ensure that the current stuff is remembering what has been written. If these precautions are not taken, it's very likely that the code will be unreadable after a certain period and very hard to maintain. So there are different metrics that suggest this. One is defect ratio per code line or per any code fragment. The other statistics that is used as a signal that the code is not very good is the likelihood that each bug fix or correction or change in the code introduces a new bug. A very famous point of software evolution was when researchers at IBM have discovered that for each bug fixed in their operating systems that there was a chance to introduce more than one bug during the change. So that basically means that at this point of software evolution, with this statistics they had, it was actually impossible to decrease the overall number of bugs in the code, statistically. Code base was pretty big back then. So I was recently working with a bigger and bigger code basis and I noticed that sometimes you just see in the code that certain things are difficult to read. So I thought I will borrow some of the techniques that people use for other programming languages to estimate how flaky or difficult to read is the code for the programmer. And that's how complexity was born. Basically machine do makes metrics about how code looks. It's very incomplete now, it's very preliminary stage, but I will welcome your comments and suggestions of what seems to be reasonable. So basically if you have code quality, besides the defect likelihood, we want to know how complex is the code, how difficult it may be to write or rewrite it, and how understandable it is. These are usual markers. So at the small scale there is a very nice tool by Neil Mitchell that suggests you improvements in style of Haskell code. So basically how to rewrite small fragments for them to be more concise, to use higher level constructs like structured recursion with folder or map instead of explicit recursion. And there are also tools that help you tell that whether all your symbols or functions or data type are commented. It's now recommended practice to have at least the shadow comments. So at least the comments that explain what is the intent of each function unless it's really apparent from its name and the shadow comments for arguments. So what each argument does, especially if we have at least two arguments of the same type, and we can measure our code complexity by the number of functions per module, by the number of interactions between functions, so which cause which, by the number of function arguments and record fields. The famous research in the psychology of memory basically suggests that humans are able to hold in their operation memory at any moment three to nine units, which is not necessarily very precise, but that probably means that beyond nine arguments it's impossible to hold it at the same time in memory. So it would be possibly preferable to break it into functions that have rather three arguments, five arguments. The same applies to record fields, so it would be best to break records with nine or ten fields into, say, record of records, especially if the record holds, for example, x, y coordinates of two points. That may be not a big deal, but if it's x, y, the coordinates of two points and also the contents within the record and also something else, like some statistics on it, possibly each of the points may be taken as the whole data structure that divides the number of components by two in case of points. The next thing is the number of types and classes. It's famous to say that too many novel type classes may easily contrive the code, as is for too many types if they do not represent new concepts, or something that we want to abstract over. And lines of code, this is most typical measure that is often used for estimating of cost of rewriting software, and when related with function points is used as estimation of the complexity and efficiency. There is also so-called Maccave and Halsted metrics, which are basically measuring the number of operators and symbols in each fragment of the code, and comparing the ratio they estimate how difficult it is to read. Basically, it's corresponded on a King-Kite metric for English language, for readability. So when we look at their standability, we can either measure a number of comments per function class type or module, per line of code, of course, or do flesh King-Kite, so basically check the readability of the English language within the code. And of course, the tell is supposed to have the goal of highlighting the complex code plans. Moreover, and more, more practically, we want to provide the value of the metrics for each of the code fragments and guide refactoring. So for each of the reports that cross particular thresholds, warning thresholds, we'll just report these code fragments. If everything is correct, and we don't want to gather metrics, then we will just say there are no warnings in this code, it looks like it's well decomposed, well re-factored from the point of view of the metrics. There may be other factors that matter, but from the point of view of the metrics, it looks okay. Of course, it's purely statistical. If somebody puts ha ha ha ha ha in comments, that will not really help. So the ontology that I use in complexity is that we have validation criteria that are based on different metrics. These metrics have thresholds, and depending on these thresholds, we have messages with different threshold levels, with different severity. And the metrics are applied for different code fragments, so we may have the same metric that is applied to different code fragments like function or module. Then there are different kinds of metrics, among them comment metrics, type metrics, and code metrics. So the metrics that we check for the comments, the most important is the code to comments ratio, which is actually mixed with code metrics. This is only partially implemented now, but it's coming soon. For the code, we measure first lines of code, that's pretty apparent. So how many different paths are in each function? How many different ways can conditions be satisfied? For the simplest example, if you have a function that just adds two numbers, cyclomatic complexity is one. If you have if in this function, the cyclomatic complexity is two, because you have then arm and else arm. So how many different cycles or how many different paths you can take in the same function? If you have the case expression with eight arms, and that's the only conditional in the function, then it would be eight. Branching depth. So basically if you have a case within a case and within case expression, the not only cyclomatic complexity may grow, but it's also difficult to read because there are like five nested branches. So complexity also will warn about this. And number of total expression nodes per function. So sometimes the number of lines of code doesn't really reflect the length of the function because we have so many nodes and so many things per line, it's really difficult to read what is inside. That probably should be decomposed into different sub-functions or local functions. Why is it dotted? Oh, because it's undergoing work basically. I should also dot the code to comments ratio because it's not fully implemented. Everything else is fully implemented. For the types, I count the number of the nodes in the type tree. So basically I think all of us have seen the types that actually look like monsters. So they span three lines and they have like 20 expression nodes, 20 type nodes inside. So 20 different type operators and variables and so on. They are difficult to read. So I issued a warning for everyone that is basically too long because then you can use new types or type aliases to abstract over the components of this, yeah, like n-doll is the simplest example. Any function from the same type into the same type can be defined as n-doll 8. There are much more complex types that are better expressing something that are complex types. And the number of function arguments, this is for the same, I basically analyze all the type signatures within the code and for each of them I made these checks. If there are any other suggestions about this, I welcome them. This doesn't include, of course, the type classes. Now for the comment metrics, I'm also about, besides the total number of comments, also about to implement flesh and type metric. Which basically divides the number of words in a sentence by the, if I remember, oh, I forgot. But Lawrence, if he would be here, he would probably repeat. Simple ratio of the length of the sentences. And then I basically was actually surprised how easy it is to implement this. So I basically took Haskell source X, which is a full-fledged parser for Haskell language. Then I used Uniplate to basically go into depth of each expression or fragment of the code and limited myself to only checking certain types of the fragments and made the functions from there. So the code is very short and concise. So the code fragments that I have are basically types. They are functions, global functions, and modules for now. That's sufficient. In the future, I will also separately treat classes, because it would be also nice not to have two large classes, for example, and records, but it was so far very simple. The metrics could be easily abstracted for any object that has, for example, contains expressions or types, can be very as easily abstracted as functions from type to particular metric value. And the only thing we need to do with this metric value is to check the threshold and show the value. So it's also very easy to reuse the metrics. And then we have thresholds that are configurable for each metric to give the severity at different thresholds. The roadmap for the future is to show more code-to-comment metrics, and possibly calibrate operator metrics like is Flesh Team Codes calibrated basically by checking the defect density in some of the open-source projects and compare it with the complexity metrics derived by complexity, because the static sticks that we know are more common for C code. They probably do not reflect the complexity of Haskell code. So that would be it. But now it's the best part, because now that was a presentation, that was boring. You can actually run it. Yeah. And it's pretty simple. So it basically can cabal install complexity, because it is already on Hackage. I was actually surprised, because it's like one day after I put it on GitHub, I got first patch. And if it's not on Hackage, you can get latest version from GitHub. So my GitHub is here. So it's basically MGuider complexity, yeah, very simple. So you can install it from there. It usually takes a while for the first time, because you need to get all these Haskell source acts, and they take a while to build, but the complexity itself is pretty quick. And let's check what we have. I built it in the home directory, but what do we have for the project itself? Hmm. Oh, not this one. Ah, because I did something. I similar the directory with my other project, that's a bit, yeah, so, okay. So it tells you, okay, there is a type signature for the function measure all that has six arguments. This should probably be less than five, so that's kind of warning, that's not really serious. And it correctly parsed 14 or 14 output files, so it doesn't crash, it doesn't parse the file, it just reports you the, sorry, the statistics is incomplete. So this is because I basically checked this source code, where before I committed the latest patches, to make sure that it looks good. But maybe something about my previous project, it's a bit bigger, but you see, there are certain problems. So there is function of systematic complexity of 52. So in C, it's usually thought that the function with systematic complexity of 20 to 20 different conditional arms, for example. So about this, it's to be, it should be stated, because it's very high likelihood that there is a defect. And this is particularity for Haskell, that we sometimes have case expressions with multiple really many arms. So basically we have these algebraic types with possibly like 100 constructors, and I'm not sure what to do about it, because they sometimes make sense, yeah. But maybe in this case, if it's this particular kind of constructor that we need to match upon, maybe we should ignore this one. But if it's not the case over a constructor, we will over type with many constructors that probably is a valid word that we should correct it. Next, I have type signature for FGATOM, that has eight arguments. And I must say I recall that FGATOM most actually abstracted sub-record that is shared between different records. It was given me difficulty, but I didn't really reflect on why it happens. I can also say that these are, the parse sheet and parse helix functions were a problem only for some time, because they are pretty well tested. I had a lot of unique tests for these ones. Next, type signature for PRINTLIST, has six arguments, should be less than five. That actually didn't give me so much problem during implementation. Next, parse-aniso, ooh, this was ugly. It seems that it's just 24 lines of code that during unique testing I had a lot of problems because it's actually not where specified on the PDB website. So it's probably not for here. Function guest element has 65 lines and also has pretty high, yeah, so it should be probably shorter. This case probably should be also ignored because I know that guest element has this case that basically you need to guest element from, I mean chemical element from atom name in PDB when it's not explicitly assigned and since the PDB has really plenty atom names, it's just basically case by case analysis. CA means carbon alpha, means it's carbon element. There is absolutely no ruling, so this we have to go through and so on. Some of these warnings may be good. All of them should be usable in Veeam or in MX or similar editor in quick fix mode, so basically in the mode where you get it highlighted line by line when you have a warning. And you can regulate the severity. That was actually the first user request. You can make your own test shots. Yeah? It's a little bit evil when you have something like, no, no, I know this is fine, please don't yell at me again about this. So still at the moment there is no treatment of annotation not to yell. It will be implemented, but in most cases I actually, if you think about it, I split the function into smaller parts. So in most cases, actually these warnings were valid. The possibly one weak point is these multiple cases. So if it's just simple case expression, that usually is actually a signal that it's just many combinations that you need to treat one by one. Is there a different annotation for that in the code? So basically an annotation that says do not yell for this warning. But in the code like in a common? Yeah, basically, yes. I would do it the same as HLIN, I think that's, and many other linters. So basically you indicate in the comment, no warn and name of the warning. If there are better suggestions, I would welcome them. I understand that you want to do that somehow as a user tool, but I'm afraid of the problem is that people start submitting captures based on a tool to a project that I would maintain. So I would for sure reject them, and I don't want them to... I'm not sure. It really depends. In many cases what I've seen, so when I check GHC code, when I've seen the worst parts, they were really the worst parts. So basically the functions that have like 100 lines of code and triple nested conditionals and they should be split it into small... I was really also talking about the annotations, not about... Yeah, so probably there should be also an option of keeping it instead of annotations in the search code in, I don't know, dot file for the project. And then you can just add this noisy dot file and that goes to work. But I have an action which works. Yes. So currently, like from a user perspective, I would just love, and I maybe already do it like a prioritize list, you know, like we have different metrics, right, we can combine them and probably there's hotspot, which needs most attention. Yes, that's true. There's somehow incidental on the road map. At the moment I do not yet know how to combine them because I didn't make a statistical analysis. I think that that would be the next step after statistical analysis to make this work. So currently they are just printed in the order they are discovered? Yes, there is also an option to limit by severity. So I normally run them in error only first and then usually that's the first threshold. But it can be also sorted by metric value. There's an other interesting thing that you would then probably want to combine with, or there's code climate, what code climate also gives you is somehow it's like how much change a particular part of the code gets, you know, you could also get it from Github history like this. Yeah. I mean they're looking to Github and Github history and then of course like the code that attach more and that's a lot of problems that I don't want to have at the top. But I'm not sure what to do with that. Yeah. That's probably a very good idea to implement just Github query. You can see on the screen right now that you have a problem with the universe. Yes. So there are still things, as I said, it's very preliminary version. So basically sometimes press to pass the Haskell file and usually it is either extensions that were wrongly unrecognized by the parser. So Haskell SRC X basically does preliminary scan of which extensions to use. I'm also very permissive about these extensions, but still sometimes some are missed and also sometimes I miss the cabal because most of these macros are actually cabal macros. So I could just include the cabal H file. But I will have to add the option to add this on the path because it just doesn't find it. Yeah. I talked to Roman Chetnyaka about how to use the main panel as the source for one of the main panels. And I think he mentioned a way to run these static tools through cabal. Yeah, that's probably the best idea. This is only Haskell code, but this is exactly what cabal is. Yeah. That's probably the best idea. I probably need to ask you later about the details because I didn't use it yet. I just know about cabal exec. And maybe cabal exec is sufficient because I already used CPP, sorry, I already used the CPP. Okay. I already used the CPP HS for pre-processing the code. Do you use any of the type of files through CPP HS? No. No. I use it as a library. So yeah. The only problem is that CPP HS for it to work, it needs to know about the macros at that time. Yeah. In my case, I don't pass it any parameters that will tell it which macros to use. Yes. So, this is a pretty technical exercise, right? And with analyzing the code that you have a couple of warning telling you this is good for my code, do you have to code something from where it says it's bad and the same code somewhere where it will be categorized the way that it looks better? Yeah. So the first thing, the way the statistical analysis are usually made, basically check it on a git or something similar repository that you can check what are the changes, how often they are marked as backfix and how often they are applied to this worse version of the code. Yeah. So if you have a code section that has a lot of backfixes and if it still looks like it has bad statistics then the signal is correlated. The other way is just to ask people how they feel about reading it. And usually the bad statistics, if it's really bad, it's clear signal that people will not read it easily. Then this is where I actually asked for the sample. And he said, well, probably the tool is to use the tool to guide you in developing it. Yeah. So if you had like a sample of code that you wrote in the tools, like it wasn't so good and how it would have been changed. So basically I applied it to itself and refactored it slightly. That helped with some of the code fragments. I looked at the hotspots that I remember from my previous project. There was still some correlation between those parts that gave me difficulty and those parts that are signal as potential hotspots. Yeah. So that's all I have for now. Possibly later on one can apply it to Rosetta code and check it out. There is something similar. Is there a possibility for the project? Do you see the raw types? So passing a Boolean around, when actually a Boolean is not necessarily very useful, you can't type your own Boolean properly. You've got to pass, fail, pass, pass rather than just true thoughts. Or using float instead of you want to price. Quantity. Quantity is a different thing. I don't yet have an idea how to do this. But what you can do possibly is besides checking out how it would be, is to check for the common words that appear in functions that always take double. So check price, add price, subtract price, maybe should have a price parameter. Yeah. And be just instances of type class noon or something like this. It's not yet, I don't yet know how to do it. So far I didn't go that deep. Deep. Sorry. Doing JavaScript and one of the hassle things is typing everything. Yeah, that's true. I think it's certainly much more complicated than counting the number of arguments. Yeah, so the current version is not very sophisticated. It's helpful, maybe, but it's not that sophisticated. And when there are signals that maybe you should introduce a type, but they are usually detected when you apply something to the wrong type, when you have an implicit invariant. So basically I would say it would be easier to detect when you have an error erased. I again applied it to something that is not really the right thing. And I would introduce it. So the queues for the re-factoring may be also other. I would say a very strong few besides the exception report or failure to apply it to the right thing is that you have difficulty remembering which kind of double it was out of 10 times. Or you are adding, as I said, the word price to like 20 different functions. The amount of parameters should be slightly overplayed. It's kind of okay to have 5 or 6 parameters of action if they have different type. But if it's for integers or for strings, it's definitely something wrong here. And it's probably the case where we have to introduce some type aliases of like what we say that we have to introduce price instead of double. So it's not just cheap or amount of argument. Amount of those of the same type. It may be very good. And I would be adding. No more questions. Okay, so as I said, Zonka promised introduction to algebraic data types. I actually promised that I would commit to something myself. Except that I don't know. It's probably going to be a game or a web application. I'm not sure in which order it will be. But I promise that I will also prepare one-on-one on something cool. So it has to meetups. On each meetup you would have something similar. Yes. And then we have short-term advance because we both have two-on-one. Is it like you have always a similar and more change? Yeah. I think that would be optimal. Anybody else wants to volunteer? I also propose that on the weekend, I'm not sure if it would be this weekend, whoever is interested, I will be mentoring people in Haskell and New Hacker Space. It will probably be Saturday. Maybe not early morning. Maybe about lunchtime and after. If anybody wants, please just message me on meetup or whatever. I'm sure there will be more than one person because I already got queries. And if there are so many people, I will just ask Lawrence or maybe some other people that like New Haskellers to join me. I think these people already know who will be asked. And if they feel like they should be asked and I didn't ask them, that's my fault, it should just happen. Any requests about the future talks, the topics that we would like to hear or suggestions? Please provide these one suggestions. I'm sure there should be feedback. I was reading about different classes of programming languages for people who have absolutely no background in functional programming. So I started with that part and at some point in time, the difference between the programming languages and functional programming languages became two kind of overlap. There is a grayish area. So I was reading about Scala and didn't quite see whether it was clearly a functional language or jlm. So which class did it belong to? It was kind of giving like a very simple set of examples that can help me to look at functional piece of board and doing the same thing as an imperative language and can see the difference clearly how things are working. So do you want me to write an imperative DSL in Haskell? Then not just DSL in Haskell. I can do that. So the writing imperative called in Haskell show you that it looks like basic because I read the learner's blog posts. So basically there are people that actually for exercise also for practical purposes have tasks that are particular students for certain language like M-rated basic interpreter and that can be written in basic. It looks almost like basic. There are six lines of code. There is also a huge opportunity in implementing Assembler as a model. Obviously some people prefer Assembler as an applicative. I think it's a very practical and cool idea and why people will definitely be interested in a code like this because it's like over the head if you start to explain how to use the neural networks to someone who wrote the code in Java for 80 years and as they interpret the problem it will be a way to raise our audience. So types of programming languages or paradigms in programming and in-line learning for Java. So that will be a bit more difficult but we will interpret it. It's the same. So showing how to approach problem from a project and the function of style because it's not just different language it's approached differently. You can code actually in Java in a functional way and talk like this. Okay, good. One more thing. I'm not asking you to do imperative style coding in Haskell. You can use a normal language. I will make it look like Java. I will even introduce the parenthesis for the parameters. You could use Java and you could use Haskell to do what Haskell does and Java to do it. Use Java for what Java is supposed to do and then show me the differences between now. You don't need to write. That's a bit more difficult. I need your collaboration. I understand you volunteer because you know Java. I prefer your approach. You don't need Java to show inverted code. I think it will be easier to use one language and show two approaches. It's not about Haskell and Java. It's about embedded and functional approach. I think your approach is actually more by using the same language. I would suggest that you just message me. I will prepare the presentation and make sure that it's as Java-like as possible and I will ask you for feedback. It's the same for your approach. For me, I'm coming from a JavaScript background and slowly transitioning into functional programming. What I realized is that I've applied a lot of Haskell functional programming concepts in Java. As I've learned more and more from the bump calls and all these concepts, I realized that I can actually implement a Haskell-like type system in Java script, in plain Java script. So I think... How do you enforce type checks in Haskell? It's not really about type check, but about the composition, what if you have an action map so you can generate the new methods. That's some idea I'm still exploring because I'm still trying to learn about Haskell. I think that if I manage to do that, then it will be educational to help people from any project to understand the functional programming concepts. Okay. So you want to talk about it in a way approachable to us? Yeah. So this is like a mixture of... Yeah, it's like a mixture of all people. It's also like implementing and functional. The other thing, there was a proposal for simple data processing, like character performance data processing in Haskell as a tutorial, so basically parsing CSV file and so on. Possibly simple 3D programming in Haskell. Anything else? Async program. Hopefully. And mixing Async and IO. Okay. The way it works usually we treat it together. Unless we have so-called deterministic parallelism or concurrency where we have only one way to finish it. Maybe as synchronous, but the result can be only one. Asynchronous doesn't mean parallel and I'm talking specifically about asynchronous programming and IO in asynchronous programming. You mean asynchronous IO? No. Like promises. So for example, it's a server request and they serve them in asynchronous way and sometimes it starts like a few other asynchronous tasks and some of them have to do some IO. So we're mixing so we have an asynchronous event and then we want to do IO so how to mix them together? Okay. One of the things is that it's easier to code you in a corner where you don't know how to access IO from your Async program. Okay. Any other requests, suggestions? At the end I would have just one or two requests. Oh cool. Okay. Let's wait until other people. Okay. So who wants to come for the Hacker space-based mentorship in Haskell? Whoa. Great. Okay. I think I will ask another mentor. Thank you. Please. Please. You're amazing.