 I have recently been working on the garbage collection. The first page here, History of Garbage Collection in Objective C. It really started with conservative garbage collector written in 1988. It's a venerable piece of software. The Lib Foundation, which was written around the same time as the New Step Base, introduced the garbage collection to an Objective C Foundation library in 1998. And I added it to the New Step Base in 1999. So Apple CoCo introduced it in 2007. And why do we have garbage collection? Well, the original reason we added it was to make things easier for a programmer. Managing memory is just a fan. So you can decrease the memory management overheads with garbage collection. You don't have to do reference counting that kind of thing. So we did have a good reference counting mechanism. We wanted our garbage collection system to be portable, cross-platform, because the New Step is a very cross-platform development system. And we wanted a single code base, solutionally, for both garbage collecting code and a long garbage collection version. And originally we wanted Lib Foundation compatibility, because we believe in conforming to the standards and the only other implementation that used garbage collection at all was Lib Foundation. Another reason for garbage collection was performance. Contrary to a lot of ideas, the garbage collection at the time actually seemed to work faster in real life. So in reference counting, that's mainly anecdotal evidence from Lib Foundation. Certainly Lib Foundation showed that connection collecting systems could be comparatively fast to normal reference counting from the data. Sorry, but isn't it that the systems which run the GC faster only deal with a small number of objects? If you deal with a lot of objects, it's much slower. When you deal with C++, you can do localization on your own and write your own memory management. A lot quicker. No, that's not necessarily true. In the virtual library, for instance, there's thread local garbage collection, which lets you localize where you're collecting data from. So you don't have to worry about scanning the whole of the memory, for instance, the Apple implementation also has different mechanisms for avoiding scanning all the memory all the time. And pragmatically, the code that you tend to write in something like C++ or C to handle the memory management is generally slower. Yes, you can optimize it to be faster, but you can also do that in the garbage collection environment in Objective C. So what you're talking about overall performance for the average user garbage collection can work out faster. Yeah, generally. And when you're talking about customizing specific micro-managing little tasks, you can hard-code it yourself in whichever language you like. So why did we use the Burn library? Well, it was public domain code, which is kind of important. It was already very mature and stable by the time we introduced it, 10 years old at that point. It was quite portable. There were a lot of systems supported by the Burn library, and it was very fully featured. It did really pretty much everything you want in the garbage collector by that time. Because of its new syndrome foundation, it had already been integrated into the GNU runtime. It's currently included in the GCC repositories, part of GCC, or at least a particular version because it's not very latest. And it was set to perform very well. So the way garbage collection is used in the Newslet base is that all the object allocation is typed to collectible memory. What we mean by that is that at the point where the object is created, we know which instance variables of the objects are pointers to things that we might want to keep a reference to within the garbage collection system. And we know that other parts of the instance variables are not, so the garbage collector knows about those and doesn't have to scan them. When we allocated memory generally, everything was allocated to scan collectible by default. So every memory, every bit of memory allocated could contain pointers to things, apart from the typed collectible objects. And also with the exception of GS atomic mallet zone. In the OpenStep API, you allocate memory in different zones. So we introduced that zone to say, here's a zone where you don't need to scan it, there will be no pointers in this area. The plus point of doing all that was that it was very easy to convert applications. You don't unexpectedly collect memory, because all the stuff that you used to implicitly allocate some of the heap and the default zone is now scanned. So it's safe in that respect, but it does mean that it's a bit slower. So more memory than necessary to scan. And data that doesn't contain pointers would be scanned. And possibly you'd think that you had a pointer and keep something allocated. It's essentially a leak memory, a little. So the use of the atomic mallet zone internally meant that actually within the base library we didn't have any of those problems. The way the root object and its object was changed was to ignore the allocation zone. Of course we don't want to allocate and in particular zone anymore we're using garbage collected memory. Autorelease, dealloc, release, retain, they're not used anymore. So they just do nothing. Retain count, returns view into max, just to say this is retained as many times as possible if not going to be deallocated. I already said that. Allocated objects using the type memory. Only the pointer instance variables are scanned. So because we changed that to remove, or comment out effectively the dealloc method, the paradigm of doing cleanup in deallocation of objects no longer works. And that means that the ideal way to handle that in most objects is to invalidate the object and do explicit cleanup of things like closing file descriptors. If we can't do explicit cleanup we don't know where we're getting rid of an object. It's hanging around somewhere. We need to use a finalization. For finalization we added a new protocol, the GC finalized protocol. Basically any class which conformed to that protocol was finalized. It was finalized by using the GC finalized method. The idea of that is that you shouldn't ever call that code in your own code, it's like dealloc, except that it's called by the garbage collector. So there should never be any need for you to call it. Except of course you need to call the superclass implementation if you're writing a subclass of another class that's already finalized. The method is called automatically when the object is deallocated. Finalization is fairly extensive to use so you should use it as little as possible. The problem is that when you do a garbage collection run you finalize all the methods that have a finalizer set and that means that you can end up executing quite a lot of code at one time. So you make the finalization method lightweight and minimize that. Weak pointers were added. A weak pointer doesn't prevent the collection of the object you're pointing to. It could be created by registering in finalized initialized methods of a class. So whenever a class instance is created the initialized method or the first-time method is sent to a class, the initialized method is implicitly run and that allows you to say all objects of that class will be finalized. Sorry if we'll have weak pointers in the future. Weak pointers in the bone library are not automatically zeroed so you can get a situation where you're pointing to something that object because it's a weak pointer gets collected so your pointer is ending up pointing to nowhere. If that's going to be a problem the object that leads is going to be deallocated needs to have a finalizer method to remove or clear that weak pointer in the other object. That's very much the same sort of situation that you get with the standard open step run upgrade where you have, say, a notification observer and that observer gets deallocated, its deallocated method removes the object from the notification system. Contrary to the collector, of course, there are times when you don't want garbage collection to happen if you've got timing critical code so you can temporarily disable and re-enable the collector. There are times when you think you've gone round and loop allocating lots of objects that you don't need anymore and you want to reduce the memory footprint of your application so you want to request an alec collection either using in-built thresholds so if so much memory's been allocated then we'll do a collection or unconditionally we want to collect any garbage right away then that's a lot. We also need to be able to register an unregistered zero and read the pointers. Well that's done via native calls to the collector. The other thing we added way back then was a whole load of macros. The calls retain if you're running, compiling in conventional system and does nothing, comments it out effectively if you're compiling for a garbage collection similarly with all the others. The idea of that was for performance. Yes, why you? Okay, back then the overheads of actually making those reference counting method calls were significant that's really not the case on the modern process. The other reasons for putting these macros in was the reliability. Destroy is a convenient way of zeroing the pointer so that you know that the objected points do can be collected rather than having to remember to deallocate it and then zero the pointer you use a macro for it. Assign helps prevent mistakes in the same sort of way. It builds in a bit of intelligence, standard usage, good practice. The downside of that is that if you compile code with those macros then if you compiled it in a garbage collecting environment your calls to retain, release, auto-release, whatever don't exist anymore so your binary can't be run without a garbage collecting version of the library. I'm not sure that that's really a problem. The downside is purely that they're a bit inelegant. So maybe we'll get rid of those macros. So why did we want to change it all? Which is really bad. Well, originally we were a new foundation compatible. There's been no interest in garbage collection in objected for many years. I know that because I know some bugs that have been there for years and no one has complained about them. Possibly that's because the retainer release and reference counting actually is good. It works very well in general. But now Apple co-co have introduced garbage collecting themselves and revived the interest in it. The lib foundation is obsolete. No one really uses it anymore so there's no point in us trying to be compatible with the lib foundation implementation. So we changed the whole focus to be Apple compatible now. We're not throwing away the existing work really and we're keeping the burn library. Why doesn't COGO use it if it's so great? Well, maybe they just like their own code. Certainly they have no need for portability. They can write COGO optimised for their own system. Their implementation does depend on compiler support which they've built in and they can do that because they have a dedicated compiler for their own system. They say they wanted more predictability and they wanted better performance. I don't actually know of any evidence that they have that but maybe they have. What difference does using burn making over Apple's implementation? Well, the Apple implementation uses zero-eat-week memory for that and burn doesn't have that. It does have zero-eat-pointing. Their implementation does generational garbage collection. The compiler keeps... The compiler puts stuff in the code so they can keep track of which objects have been allocated recently and so they can look at those recent objects first when they're doing a garbage collection run which hopefully means that they get rid of objects that they're likely not to need very quickly without having to do a full run. On the other hand, the burn library has thread local garbage collection which means that you don't have to worry about stopping all threads to do a garbage collection. There are performance differences either way. The change to the way the base library garbage collection works to make it compatible with CodeCode. All that object allocation is times collectible. That's not a change. It stays the same. We have a new function NSAllocateCollectable. That's Apple API. That will give you scanned or unscanned memory and it will say whether that memory itself can be collected or not. Default Malikzo is unscanned, uncollectible. So you have to pre-explicitly. That's completely the opposite of the way the base library used to do it where we had scanned, collectible by a code. That's a major shift in that one point. Finalization. Well, they still have finalization but of course that is different. Now they have finalized method where we had a GC finalize. We have to change all our method names but that's pretty trivial. All you need to do is implement that method in your subclass to have it without that class finalized whereas we have protocol to flag whether or not the class needs finalization. That was also easy to change. It looks cleaner and simpler. I prefer finalize as a method name for finalization. So we already changed to implement that. There's zero weak point. That's a fairly big issue of difference. Koko added the weak keyword in the compiler to mark an instance variable as zero in the weak pointer. Because we don't have the compiler support yet though it's possible we could add it to GCC. We have a runtime mechanism to do the same sort of thing. So we've added the function GS and 8 weak pointer and what you need to do is call back in the initialize method of your class to say that a particular instance variable is a weak pointer. We also have GS assign zero in weak pointer which lets you assign a value to a zero in weak pointer. Apple don't need that because their compiler implicitly does it. It knows that the pointer is a weak zero in weak pointer because of the new keyword and the compiler will do the overheads of telling the garbage collecting library that you've made an assignment to that pointer. We have to do it explicitly. Again, if someone adds compiler support we can avoid doing that. Hopefully that's not too difficult. Notifications. That's a big part of the way open step applications work sending notifications to different objects. If a notification center retains an observer then the observer is never going to be deallocated. Obviously we don't want that. So we don't retain the observer and we use a weak pointer. But if the observer is deallocated and we haven't removed it from the notification center we're going to get to crash and try to send a notification to that object. So the observer has to be removed again somehow. Traditionally that's done in the observer's dealloc method. But now we have a zero in weak pointer used in the notification center. So the observers no one can need to remove themselves. We have a similar situation with key value observing where an observer normally unregisters itself from an object at the point when it's deallocated. Again, we use zero in weak pointers so we avoid the observer having to do that. It's a mechanism that's generally applicable for any sort of system where you have one object observing another object. And it avoids the need to finalize those observers. And as I've said before, minimizing the use of finalization is a good thing. The other advantage is that the burden of management is lifted from the application program. They no longer need to worry about removing themselves as observers. We just forget about the whole thing. Traditionally delegates of objects are not retained. Now that's so that you don't get retained loops causing delegates to never be released which is a problem with the reference counting system. With a garbage collecting system you don't have to worry about the retained loops because the garbage collected will resolve that. So we actually have no need to use weak pointers for delegates. The fact that our zeroing weak pointers are currently different from the other ones is actually nothing like as big a problem as it might seem because almost everywhere that you used to use weak pointers or would expect to use weak pointers, you don't actually need to anymore. So how's collection control? Well before it was done directly by course to the Bohem library. Now we use Apple's API and new NS garbage collector class. So this class is implemented already and you don't have to know anything about the Bohem API anymore. The default collector method of the class returns nearly to garbage collections, not in use. So it's a great runtime check to see if the application's working in a garbage collecting environment. So at the moment for instance we have we use that in the regression testing suite to control which regression tests we do depending on whether we're running with garbage collection or not, obviously some tests are inappropriate in different setups. Collective needed, that trigger to garbage collection is really the same as the functionality we have with the Bohem library. So we just wrap the Bohem library's call. There's a threshold amount of memory if we've allocated that much memory since the last garbage collection then a collection takes place. So collective exhaustively we have a full garbage collection run. Disable, enable, is enabled they tell you whether garbage collection's in use at the moment. Okay, the NS auto release pool class. It's pretty much obsolete. Adding an object to a pool obviously does nothing, there's no point doing it. Apple added the drain method so we implemented it too. The same thing as collective needed so I'm actually, I have no idea why Apple added that method as far as I can see the fact it really ought to be completely obsolete in the garbage collection environment but we did that with compatibility. The NS pointer array class. It's a new class introduced by Apple in MacOS 10.5 can hold pointers and or integers. It can contain nil objects and null pointers which is a kind of departure from all the collection classes traditionally in the open step API. You can tell it to use zero and weak memory so that items become nil or null when they're collected. So internally we implement that as using the zero and weak pointers that the phone collector uses. So from the point of view of an application programmer there's no difference there. It works exactly like the Apple implementation. Not completely implemented in the new step yet. I expect that to be finished in about two weeks. And it's hash table. When we originally implemented garbage collection in new step base way back we added support for weak pointers there. But they were not zeroing weak pointers. Apple have taken a very different approach. They've actually removed the NS hash table object as it was and made it into a full class with support for zeroing weak pointers. So it's automatic to remove all the collected items. We are going to change it to match that and I imagine that will probably be ready in about two weeks as well. It's really the same, pretty much the same set of code as the NS pointer array. NS map table. The situation is very similar. Originally we supported weak pointers but only for weak pointers for keys and values. So if your map has this set up for weak pointers then it was both weak pointers for keys and weak pointers for values. Apple have made it into a class and they've added separate support for weak pointers for keys and weak pointers for values. We're going to make the same change. Again you're looking about two weeks before that's done. So we have almost the same garbage collection functionalities Apple do. Hash table and NS map table need updating to be entered as classes. Direct use of zeroing weak pointers remains different. That's the only difference you'll see between the two APIs now. That will probably change if and when someone puts in the compiler support in GCC to automate it for us. But still then, as I say, it's minimally used. So not a big deal for portability. And that's it. So essentially an overview of all the change. Basically we've implemented or will have implemented within a few weeks the entire garbage collecting API of MacOS 10.5. Questions? And actually I could make clear what's the main strategy to run garbage collectors tracing from routes to objects or reference columns or something combined. Well you really ought to look at the garbage collecting library documentation. Essentially it's a conservative garbage collector. It scans memory. A full scan from columns. A full scan? It scans everything on the stack, everything in the registers and everything on the heap that it's been told to scan. So it scans through all heap, which is quite expensive? Yeah, but it will do thread, that's on separate threads. Separate heaps for separate threads, if you like. And generally speaking, it's not that expensive because most of the memory you allocate doesn't get scanned because it knows not to scan it. It knows there are no pointers in it. Or it knows that certain sections of memory have mixtures of pointers and non-pointer parts of memory. That's called types memory. It knows what type of memory is at different locations. So it knows which parts it needs to scan and which parts it doesn't. So it's nothing like having to scan the whole heap. In fact, it can make good guesses about what's a valid pointer and all that kind of thing. It's really very good at minimising that. It's a key feature. So that's something that Burn worked on for many years. Thank you. So how does the MOTC map see in the stack which are pointers and which are not response of pointers? The stack, it doesn't bother the type. It has to assume that... Well, obviously it knows that some things are the frame pointers with the stack. It has an understanding of the actual stack layout. But when it comes to the actual data variables on the stack, it won't know the type. So it has some fairly basic heuristics. If what it sees on the stack is an integer that has the lowest bit set, it knows it's not a pointer to a block of memory. So it won't bother looking at that. But essentially it's going to look at the whole stack. So it's still controlling the whole stack? Yeah. Anything else? Okay.