 Tommy, he's an engineer at Clinical, he's going to be talking about SOFA coupling. Take it away. Thank you. So, you already know my name. You already know where I work. You can reach me on the social media thing, which apparently is a good thing. I'm also elected to the NZ pug, so if you have questions about what we do and why you should be a member, you really should. If you're in this room right now, you can come and talk to me. I'm also the conference director for Kewpie Park on next year, which will be in Dunedin. Yes, the best city of all of them. So if you have questions or ideas or you want to help out or you want to give me lots of money, you can come and talk to me about that as well. I'm going to talk to you about three things today. First, it's going to be a wee bit emotional, a little bit touchy-feely, and I'm going to tell you about a psychology of failure, a personal failing that I have. And I'm going to tell you because I suspect many of you in this room share this particular problem. There will then be an ever-so-graceful segue into a brief discussion about code coupling and flexibility. And finally, I'm going to tell you about what you've actually come to the talk for. I'm going to make you wait and talk about this thing called kinescence. Hands up, who has heard the word kinescence before? Yeah, awesome. Okay, this is good. This is a good state to be in. If you'd all put your hand up, I would have quit. So here's what happens, right? I am a serial project starter. I write a lot of code, some for work, some for personal projects because I guess I'm a masochist. And at the start of the project, this is a graph of velocity over time and it's handwritten so you don't take it too seriously. And in the beginning, because I think I'm an okay programmer and I take pride in my work, I try and do things well, right? I try and write good code. I exhaustively test all my code. And I think I understand things like the solid principles and I think I understand the law of demeanor. And in the beginning, I have a code base that I'm proud of, right? It's clean. It has tasteful abstractions, if you want. And that means that I can go really fast. I can fix bugs quickly. I can add new features quickly because I understand the code really well. I can go in. There's no trouble modifying it to do the new thing that I need it to do or to fix the bug that I'm trying to fix. And you can kind of see there's a hint here, right? This line is not exactly level. And what happens after a wee while is I wake up one day and I realize that this has happened. I'll wake up and I will need to fix a bug, let's say. The sort of thing I've done hundreds and hundreds of times before on this code base. But this time something's different. The thing that's different this time is these tasteful abstractions that I carefully built at the beginning of the project no longer work for me. Fixing the bug now requires me to rip out code that I had written previously to change it, to refactor it, and to put new stuff in. Now, sometimes this is easy, right? Sometimes it's as simple as a search and replace throughout the entire code base and maybe replace one thing with two things or something like that. Sometimes it's a lot harder, though, and it takes longer. And eventually the kind of project velocity levels out and there's no numbers on these axes and this is all handwritten and I've kind of made it more dramatic in order for the presentation to be better. But eventually I settle on some lower velocity. And at this point I kind of beat myself up about it, right? I think I'm clearly not a very good programmer. I clearly don't understand the solid principles or I wouldn't have written this code that I have now had to delete. Like, what a waste of time. Does it sound familiar to anyone? Is it just me? Okay, some people are nodding. Yeah, all right. Some people were nodding before I asked, is it just me? Now, intellectually I realize that this is a ridiculous thing to be thinking and the reason why it's ridiculous is because the world is not a static place, right? The world changes. And the software that I'm writing, I'm writing hopefully to solve some real world problem. It is ridiculous to think that my software doesn't have to change over time. And this leads me to think that maybe my priorities were wrong, right? And my assertion, the argument that I'm going to make to you now is that as an industry our priorities when we're designing software when we're designing code are often wrong. It seems to me we spend an awful lot of time prioritizing correctness. That is, can we solve this particular problem today? And I think instead the most important property of a code base is flexibility to change. And the reason why I think that there's two things. First of all, if you're the best programmer in the world, if you have the best development team, the best thing you can do is make your software correct today, right? You can solve today's problem. You can solve today's customer need. But we know that that's not going to be tomorrow's customer need, right? In a week, in six weeks, their workflow is going to change, the real world will change, and you're going to have to change your software. The second reason, and I think this is much more common, is as a programmer you have an imperfect understanding of the real world. You are building imperfect models of the problem that you're trying to solve. And a lot of the time, the change that we have to build into our software is because we now understand the problem better, right? We've delivered the software to our customer and they've come back and they've said, well, actually in this case our workflow is slightly different. Or, you know, there's this edge case that you haven't thought about, it does the wrong thing in these situations. So we want flexible software. We want code that responds gracefully to change. But what prevents that in software? When we write code, what prevents us from being able to change it easily? And I was able to quote Benjamin Disraeli before because he happened to say exactly the thing that I wanted to say. And for this next slide, I couldn't find anyone who said what I wanted to say. I'm going to go on a bit of a narcissistic streak and quote myself. So I'm going to say the enemy of flexibility is rigid coupling. And remember this quote, right? Because this will make more sense at the end of the talk. We're going to come back to it. And hopefully I can convince you all that this is true, right? Imagine for a second a code base where you have lots of things. Maybe they're classes, maybe they're functions, doesn't really matter. And none of them are coupled together, right? They don't call each other, they don't depend on each other in any way. It would be very easy to change that hypothetical piece of software, right? You could go in, you could edit any one of those things. And the only thing you would change is the thing that you intended to change. There would be no unintended consequences from that edit. The problem is that's not software. It's not even a library, right? It is completely useless. And in the industry we have this piece of advice, right? We're all told, or at least I was when I did my degree and learned how to program, you should build software with high cohesion and low coupling. And I find this advice useless. It's correct, right? Yes, we should build software with high cohesion and low coupling. And you'll notice it says low coupling, not no coupling. But there's lots of different types of coupling. And this does not help me write better software. It just doesn't, right? What I would like is I would like a way of being more precise with the low coupling part of this. I would like a way of saying, well, this type of coupling is probably going to cause me more trouble than this type of coupling. I want a taxonomy of coupling. But it turns out that thing exists. It's called connasance. And I'm very pleased that very few of you have heard of it before. It's a proper English word and it means two things that are born together and it implies that they change together over time. You know, they're linked in some way. So you can see how this applies to software coupling. I didn't invent the term. It was first coined in 1991, I think, by Mila Page-Jones. Since then, a number of people have kind of expanded on the topic. And connasance is a taxonomy of coupling. So there are lots of different types of connasance. We're going to go through all of them today, but it's going to be fine. It won't be too rushed. This will make more sense soon. We'll come back to this. Because we're all programmers, we love code. Here are two pieces there from different bits of a code base. At the top it looks like we have some model. And at the bottom we have a function that takes an instance of that model. How are these two things coupled? Yes, there is audience participation. Shout out your answer. By name, right? The top one declares a method called setemail and the bottom one calls that setemail name. If I change the top one, I have to change the bottom one. And at this point, at least some of you are uncharitably thinking that maybe you want to be in the other room and the other talk looks a lot better. I promise you this will get better. This is connasance of name. These two are linked by name. When you change the name of one, you have to change the other. But every piece of connasance has three properties that you have to consider together. And the first one is strength. Strength is this vertical axis here. And this list of connasances, we've just looked at connasance of name, is ordered by strength. Knasance of name is the weakest connasance. So it's not that bad, right? The reason why it's weak is because if you wanted to make that refactoring, you could do search and replace in your editor, right? We have tools that help us find all the instances where setemail is called and change the name. It wouldn't be that hard a thing to change. So that is the strength of a connasance. It's basically how hard is this type of coupling going to be to change in the future? There is another instance of connasance of name on the screen. Can anyone spot it? Anyone at all? Sorry? What about it? It might, but it's not the magic answer I'm looking for. So I'll tell you the second instance of coupling is that I have this variable called customer in the parameter list for do something. And on the line afterwards, I'm referencing it, right? If I change the name of this variable, I must also change the line afterwards. Now, at this point, you're all thinking, well, this talk's getting really bad. Like, duh, of course. But this is the second property that you have to consider. And it's called locality. And this means how close the two connasant elements are to each other. Now, I said you have to consider all three of these properties together. What this means is that you can trade things off against each other. You can have stronger types of connasance that are close together. Or you can have weaker types of connasance that are far apart. What this means is that in your code, you'll have something like this where red arrows represent strong types of connasance and green arrows represent weak types of connasance. There's a very interesting kind of interplay between this idea of having strong types of coupling that's close together and cohesion. But that's a totally different talk. But I'll mention it in passing. The third property that you have to consider is the degree. And that is simply how many pieces are affected by this coupling. Is it two things? Is it a hundred things? So these three properties together give you the tools you need in order to determine if this is a problem that you ought to be worried about, that you ought to be fixing. Strength, locality, and degree. Strength is the only thing that is predetermined. The locality and the degree depend on your code base. So let's look at some stronger types of connasance. We've done connasance of name, which is the weakest. Conasance of type is when multiple components must agree on the type of an entity. And as Python program is, Python being a dynamically typed language, we have a very special relationship to connasance of type. Again, let's look at some code. Here's a function that gets an iterable of prices and it does a running total. There is coupling here between the definition of this function and everywhere where this function is called. If I call it like this, the function is going to work, right? It's going to give me an answer, but the answer is not going to be what I want. If I call it like this, who thinks it's going to give me what I want? Hands up? Who thinks it's not going to give me what I want? Who's too shy to commit to either option? Yeah, right. So it turns out actually it's still not going to give me what I want because floating point is a wonderful, wonderful mess. But these costs are for products, right? And because we have products, we're probably storing them in a database. So we're also going to have some database table definitions somewhere so the locality of the canasins in this case actually crosses two code bases because this type cannot store perhaps that answer. So we can end up in weird situations where we have data in our program that is now not representable in our database. We have to consider the entire product and often we'll find locality that crosses code bases or crosses languages. Canasins have typed those still at the relatively benign end of the spectrum, right? Let's look at something a wee bit stronger. Canasins of meaning is about applying semantic meaning to things that inherently don't have them. If we imagine a code base that does credit card processing, for example, we might have code that looks like this, right? We want to be able to put test card numbers through and in this case, this is the function that validates the credit card numbers and if you put the test card number and you want it to always return true. But you're also going to have a function that makes a payment. You're also going to have a function that does a refund and you're going to have to repeat this magic string literal everywhere in your code. We have canasins of meaning between everywhere where we're using this string literal. Now, again, this is easy to fix, right? We can do this. We can store the string literal in a named field and now we have changed from canasins of meaning to canasins of name. We've gone to a weaker form of canasins, which is good. But there is a trade-off here. In going to a weaker form of canasins and reducing the strength, we have also slightly increased the degree because we now need a place to declare this test card element. Now, maybe in your code base, that's a good thing to do. Maybe it's not. You can see how canasins of meaning is still reasonably easy to change because we could search through our code base and find every instance of this long string. But what about this example? It's exactly the same thing, but now the literal that we're using is going to be a lot harder to find. You're probably going to have the number two all throughout your code base in all sorts of unrelated places. And again, we can make the same refactoring in order to fix this. Let's get a wee bit stronger. Canasins of position. Multiple entities must agree on the order of values. Here we have two pieces of code. You have to imagine that the first one is pulling something from a database. It's probably not going to be static values hard-coded in there, but it's returning a user object and in this case it's just a list of values. In our system, we can do dangerous things, but before doing the dangerous things, we check that the user is an administrator. So this code works, but what happens when we decide that we actually need to store middle names? At this point, we have two not very good options. We could store the middle name at the end of the string, but then we still have canasins of position everywhere that wants to access the middle names has to remember that it's not obvious. It's not logical. It's not first name, middle name, last name. It's something unintuitive, and that's a bit of a problem. Or we can do what's intuitive and store it between the first name and the last name. But at that point, if we miss, if we fail to change even one place where we're doing this check for the administrator, then everyone who was not born in the year zero now can do dangerous things. This is clearly not a good idea. We can improve this by changing the structure, and that fixes that particular problem. You can hopefully see, though, that in this case, if all of the bits of code that used this particular value were very close together in the code base, it would be a lot easier to change. You could find them all easier in the same file. Conaisons of possession also occurs in API design. This is why if you've read, like, the clean code advice, they say you should limit the number of, you know, parameters to a function. It's because if you have lots of parameters everywhere that you are calling that function, you now have to remember, you know, is it first name, last name, email, or is it email, first name, last name? If the types are compatible with each other, it's a system that, you know, doesn't throw an arrow, but doesn't do anything sensible, either. Conaisons of algorithm is when multiple components must agree on a particular algorithm. We're now getting to reasonably strong types of software coupling. Let's imagine a web app at the moment. I don't know what it's for, but users sign up, and when they sign up, they have to provide an email address. And our back end is Python, because of course it is, and we want to make sure that users don't give us completely garbage email addresses. And so we're doing some sort of best effort thing to verify that, you know, they didn't give us just their name. And because we're masochists, we're using regular expressions for some reason. But our front end developers, that's all this JavaScript stuff. I don't understand how that works. And they have some library that, you know, validates form input, but we're using a different regular expression. So now we have two algorithms that are supposed to do the same thing and they don't agree. We can get into situations where users get errors on the front end, but the back end would happily accept it. What's worse is we can get into a situation where the user front end code works and says, yes, your email address is fine, but the back end doesn't accept it. This also happens a lot in tests where the tests are poorly expressed, where in this particular test, the test author has kind of looked at the production code and said, ah, you know, they're using an MD5 hash under the hood. I'm going to do the same thing in my test. Instead of testing the intent here, which is that, you know, the hash is unique for different users. We also get cognizance of algorithm any time we're taking complex types and serializing it to some simpler medium and then reconstituting it. Any time you save data to disk, right, what is it? Is it UTF-8? Is it ASCII? Is it UTF-7? Because you're doing something with IMAP? We need to think about these things. If the algorithms don't agree, we can have serious issues. So we've looked at the first five and it turns out this isn't bad type setting. There is actually a reason for the slight gap between cognizance of algorithm and cognizance of execution. The reason for that is that the first five cognizances are called static cognizances and that means that you can reason about them knowing nothing other than your source code, right? You can look at your source code, you can find them and you can fix them. The last four are dynamic cognizances and in order to reason about those, you have to understand the runtime properties of your system. This has a couple of effects. First of all, it makes them much, much stronger, right? Weird things happen at runtime. It can be very hard to predict exactly what your program is going to do. The second thing it means is that it's actually really hard to come up with small examples that fit on one slide. If we take something that is a dynamic cognizance and kind of boil it down and remove all the extraneous bits on one slide, it loses some of its charm. So we're going to go over the last four very quickly and some of the examples are perhaps not great. I beg for your forgiveness. The first is cognizance of execution order which is when multiple instructions must be run in a very particular order. So the classic thing here is you have some resources and they have to be locked because you're accessing them from multiple threads. These two functions, if I run them on different threads and I am unlucky, will deadlock. They are locking and unlocking my mutexes in different order. If you have ever spent a week trying to track this down in a large codebase, right now you should feel a kind of chilling dread at this code. So this is a problem, right? And maybe we can refactor this code in order to not require so many mutexes. Maybe we can't though. Maybe instead of reducing the strength we can reduce the locality by moving all the code that accesses locked or lockable resources to the same file. And that way when we get a deadlock we know there's only one file to look at. They're easier to find. Maybe we can reduce the degree and eliminate some of the bits of code and refactor them so that the number of places in which we're having to lock and unlock these mutexes is reduced. Conaiscence of timing is when the timing of multiple instructions is important. This happens a lot in distributed systems. So if you're doing microservices like my team is at work, we can have this issue where we're talking to some account server in multiple places and because it's an HTTP call and the account server might be down network issues, we give it a timeout. If the timeout is not the same across all our services we can have weirdness where the user was able to authenticate but in some back-end server several layers deep, something died. But it's not consistent across all the services. So now we're trying to debug something and it becomes hard to reason about. Conaiscence of value is about multiple values changing together. Again, we often see this in test code where you have to imagine this is some checkout system and we're scanning a barcode and then we're asserting that the price that we get out should be of particular value. Note we also have conaiscence of meaning here because 2.5 as a float is not a sensible place to store a price. But anyway, this price has business significance. It probably comes from a database. It probably comes from a customer. If we change that all our test code is going to break despite the fact that the checkout is still working fine. And finally, conaiscence of identity is when multiple components must reference this should say the same entity. So we see this for example if you're using an ORM that uses the active record pattern. You have an object that represents a row in a database and if you have multiple pieces of code that need to update that row they have to talk to the same object. You can't talk to a copy of it. We can also see this sometimes in multi-threading patterns where we can have multiple worker threads that are pulling work items off a queue and if you want to put work on that queue you have to talk to that queue. You can't talk to another copy of it. So we've now gone through all the conaisances. Hopefully you can see how the ones at the bottom are stronger than the ones at the top and that when we're refactoring if we can move our code towards the weaker conaisances or if we can reduce the degree or the locality we end up with a code base that's more flexible to change, that's easier to change. Throughout this talk I've been giving links to a website called conaisance.io. I've been preparing this talk for a couple of months and one of the things that I have trouble with is there's no good single reference for conaisances. I think this is an amazing idea or I wouldn't be talking to you. So I did what stupid people do and thought, oh, I'll build a website like how hard can it be? But it turns out I'm a terrible, terrible web developer. This is not what I do professionally. I'd be the first to admit that it's awful and so it's open source though. Patches are most welcome if you also like the idea and you'd like to help me with it, please come and talk to me this weekend or afterwards. If I want to try and wrap things up we started out by saying that the real world is an ever-changing place. It's an ever-evolving place and our job is to write software that hopefully solves problems in the real world. Because of that, our software has to change. I then went on a narcissistic streak and quoted myself and said, the enemy of flexibility is rigid coupling and hopefully now this has slightly more significance for you. Hopefully now at least some of you are thinking well I can replace the words rigid coupling with strong canasances, right? Or canasances where either the strength or the degree or the locality are high. But this is a better sound buy. And then we started talking about canasances and there's four things I want to leave you with. First of all, we can't just replace the word coupling with the word canasance, right? That doesn't get us any further forwards. The nice thing about canasances is that it's a monotony of coupling. It allows us to look at coupling and divide it into different groups and to reason about them and to talk about their properties. It is a software quality metric and like all metrics, it's flawed. It's not perfect. The nice thing about canasances is that with those three properties, only one of them is predetermined, right? The strength is predetermined. The locality and the degree are properties of your code base. So I can't tell you for any particular piece of coupling whether it's worth refactoring it or not, all right? That's only a decision that you can make with understanding of your code base. Thirdly, like design patterns and many other things, it gives us as engineers as developers a shared vocabulary. So I can go into a code review with my colleagues and I can say, look, we have canasances of algorithm between these different code bases. It allows us to be a lot more expressive in our conversations with each other. And finally, as I've been making these slides, it's been interesting to me how often other principles that we've learned as engineers have kind of popped out of thinking about code in terms of canasants. So for example, primitive type obsession is canasants of meaning, right? Yes, it's another word, but when we use canasants, because we have this more nuanced view, it allows us to reason about our code a bit more. This is all I have, but I will certainly answer any questions that you have. I'll see you turn it on. Any questions? Oh, yes. Thank you. The canasants of identity example you gave and the name, so the very top and bottom seem very, very similar. Maybe this isn't the time to drill into that. They do seem similar, but the difference is that identity is dependent on runtime things, right? Whereas name, so I'll let me think of it. Yes, in Python and in lots of other things as well, right? Like even in a statically typed language, if we think of the active record pattern, I can have a row object and rename it. So the actual name that I'm referring to can change. The important thing is that I'm referencing that thing in memory, not a copy of it or something like that. But I mean one of the things is that, I'll just go back to this list here, I'm not totally satisfied with this, right? This is an evolving idea, and in particular I've been thinking a lot about how canasants of type in a dynamically typed language I think significantly further down the list than it appears here. I think this is probably a good categorization for statically typed language. But there's, I need to think about that some more. Do you think it's kind of well enough to find, you talk about as a metric, do you think it would be possible to build an analytic tool that could put out those metrics? I'm glad you asked that question. I think for these static canasances in theory it is certainly possible to build a tool that will find them all and measure their degree and measure their locality and give you advice. First of all I don't think it's possible for the dynamic canasances. Second of all I don't think it's a good idea. One of the things that I love about this is that it's not black and white, right? You have to use your head and you have to think about, you know, well actually in this code base I'm going to make an exception for this reason. However there are tools out there that approximate this for the static canasances. So yeah, I think it's possible. I'm not convinced it's a good idea. Over here. Thanks for your talk. Can you please summarize the difference between strength and degree because I'm not quite sure on the difference between them. Yeah, so let's look at canasances of algorithm is a very strong form of canasance. And it's strong because if you have the system, this web app, let's say you're doing this in several places, right? You might have a more distributed back end system. It's going to be hard to find all the places that you're validating email addresses. They might be in multiple libraries but even if they're not what are you going to search for? Right? You can't, I mean you could try searching for that exact regular expression but then you're not going to find all the places that use a subtly different way of doing it. So strength is predetermined and it's based on how hard it's going to be to find that coupling and to fix it. Degree on the other, sorry, locality was the other thing, right? So locality is simply how close together these things are. Ignore for a second the fact that the second piece of code here is JavaScript but let's think for a second that these two are actually in the same file. They're right next to each other. If you find the first, you're going to find the second one very easily. Having the code closer together means that as a developer it's easier to spark the other cognizant elements. If they're far apart, it's harder, right? If these happen to be in separate Python modules in the same code base, then that's a bit harder. If they're in totally separate code bases, that's much harder. So the strength and locality really do form this kind of yeah. So degree is how many places are connected. So here we have two, but maybe there's two more places elsewhere in the code base that also do the same job, right? So it's about how many things you have to worry about as you're doing your refactoring. We've got time for one final question if there is one. Okay, let's give a round of applause for Tommy. Thank you.