 A few years ago I made some videos laying out a case against object-oriented programming, and since then I've been meaning to make a positive case for an alternative, a prescription for how code should be written rather than how it shouldn't. It's taken me a while to work out what I think can be said about how to write code with any certainty, and confusingly the prescription in part might sound suspiciously like OO to some ears. I'm not really recanting my prior position though, many people just remain confused about what's unique to OO and what's not, hence I'm calling this object-oriented programming as good, but with an asterisk. A more honest title for this video might be module-oriented programming, where module refers to a unit of reasonably self-contained code, but the even more honest title would be How to Write Good Procedural Code. Incapsulation, polymorphism, and even inheritance will be a part of the story, but as I'll explain, the total picture has profound differences from object-oriented programming. So here's how I think code should be written. Code should be written in units we'll call modules, and these modules come into basic kinds, state modules and logic modules. The distinction is that logic modules contain no internal state, i.e. global variables, and logic modules do not reach out for external state. This means that a logic module consists only of functions, and these functions only touch stateful things which are explicitly passed to them, so a logic function can, for example, read and write to a file if an open file is passed into the function, but a logic function cannot itself open any files. The management of state is not the responsibility of logic code. A logic function may generate new data and also may mutate its inputs, but its only responsibility is to generate and mutate data only as it says it will in its documentation. The side effects of a logic function are actually then the responsibility of its callers. So the relationship between state modules and logic modules is strictly one way. State code can call into logic code, but not the other way around. As for the state modules, each should protect its private state, and state modules should only directly touch each other's public interfaces. In other words, the modules should be encapsulated. And even though logic modules have no state to protect, they too should distinguish between public functions and private functions so as to minimize exposed surface area. Now you might object that modules might sound like objects, but there are two key differences. First, there is no rule about how big modules are allowed to get. They may be tens or even hundreds of thousands of lines long. Very large modules are perhaps not ideal, but I'm not going to give you any hard rules against them. Second, a module is not an instance of a data type. In almost all cases, our state modules are singletons and we're unapologetic about it. Data types in fact don't really belong to any module. The best way to think about data types is that they live outside all modules and when data is transmitted from one module to another, the structure of that data belongs to neither module more than the other. We wouldn't say that a protocol belongs more to a client or more to a server and the same should go for data types. As a practical matter though, a data type must be defined somewhere in code, so we generally define a data type in the module where it's most predominantly used. We may also sometimes want to encapsulate operations on a data type, in which case we'd put the data type and those functions in the same module. Because state management is an ugly problem, the general goal in any code base is to minimize the proportion of state code. As much as possible, we want to punt code from our state modules into logic modules. In some programs, state management may inherently predominate, but in many cases, the state code can be a small fraction of the whole code base. The question that follows is when should we break up modules into smaller modules and along what boundaries? For state modules, a major reason to break them up is to divide and conquer state management. For either kind of module, we might also break them up simply for organization. All code concerning feature A goes into module A, and all code concerning feature B goes into module B, etc. Such structuring can make the code base easier to understand if done well. Just be careful not to overdo it. We also might split up modules for the sake of team collaboration. Group X takes ownership of module X, Group Y takes ownership of module Y, and the modules are coupled only through their public interfaces. Even for logic modules, this can help, because it allows one group to change internal implementation as long as the public interface remains unchanged. For similar reasons, we might split up modules to better formalize an externally exposed API. We don't want to bother our external users with details that don't concern them, and we want freedom to change what we've kept private. Again, many people today accredit these ideas to OO, but the idea is actually predate OO and don't require us to follow the rest of its prescriptions. We should not conflate modules with data types, and we should not obsessively whittle modules down to tiny sizes. So yes, though I spent most of my earlier videos arguing that OO sells an overly aggressive form of encapsulation, I think encapsulation is actually a perfectly useful idea. But then, about polymorphism and inheritance? My take on polymorphism is that interface types, or type classes, or traits, or whatever semi-equivalent in your language, they can be extremely useful. Their utility, however, is primarily across module boundaries, particularly across the boundary between an API and its consumer. With these interface types, we can formalize commonalities between different data types, including data types defined externally to our own code, much like a protocol allows us to treat client, servers, and peers like swappable components, interfaces allow us to interoperate with code that hasn't yet been written. If however, I have no need to allow for such external extensibility, I generally avoid interfaces. An interface requires me to speculatively generalize to imagine needs I don't concretely yet have, and this extra burden does not always pay off. For internal business, for modules well contained within my control, I don't really care about that kind of extensibility. For code I control, flexibility is maintained by favoring the simple solutions to my concrete problems rather than entering the realm of speculation. Now, as for inheritance, inheritance is also a mechanism for expressing the commonality between types, but inheritance additionally shares implementation. A child type automatically shares the implementation of its parent, except for those parts the child overrides. This implementation sharing might be convenient in some cases, but as famously noted, it tends to make code fragile. Changes to ancestor class implementations can affect descendants in unexpected ways, sometimes leading to some pretty nasty bugs. On the other hand, where types have lots of overlap in their data, inheritance makes these overlaps in code clear, and more convenient to write in the first place. This is something I think the go language gets right. A type can inherit the data of another, but this embedding, as they call it, does not create a subtype relationship. So that's actually all I have to say about how to structure code. I think it's a relatively simple picture overall. I'll end though by elaborating on what I think this implies that OO gets wrong. As already discussed, OO conflates modules of encapsulation with data types. Second, OO is overly optimistic about how frequently and easily we can create good abstractions. And third, OO favors design with too many small pieces. So about the second point. The underlying premise of OO is that more abstractions are always better and that we can create good abstractions in the normal course of application development. This is wrong. Good abstractions take a lot of hard thought and time to get right, and typically they emerge only slowly over many iterations. Now, when creating an API, creating good abstractions is part of the job, hence creating a good API can be very difficult. In normal application code, however, we should free ourselves from this burden where it's not truly necessary. These do arise where new abstractions provide better solutions, but our default mode should not be seeking to create new abstractions. As for the size of our code units, the difference between procedural and OO is how and when we subdivide the units. In procedural code, we modularize by accretion, only splitting up modules when they get too unwieldy. In object-oriented code, however, we modularize speculatively, splitting up modules in anticipation of problems later. In theory, OO code is flexible. We create a bunch of small independent pieces such that system behavior can be changed by reconfiguring the connections. But in practice, the burdens of speculative generalization rarely pay off. Good structure is about a thing's relations to other things, but OO wants us to build piecemeal tiny units in isolation. The pieces produced in this process might be easier to build and understand individually, but the whole system is not. Object-oriented design requires context, and these tiny units are isolated from context. Object-oriented design is often sold as the responsible thing to do, the moral equivalent of keeping a closet or drawer meticulously organized. What typical OO designs feel like, though, are tangles of excess packaging. In the worst cases, everything is so fractured, indirected, and abstracted that, like in Wonderland, nothing in code is as it seems. The names of the classes and methods lie about what they actually do, because the real work is always done elsewhere. The procedural alternative comparatively encourages us to introduce subtractions only when they're much more likely to be useful.