CHAPTER 36 INCREMENTAL GARBAGE COLLECTION We consider here a straightforward use of ~virtual ~memory for large databases. We simply believe that memory is large. As noted earlier, a major disadvantage here is the large value of USED. A large value for USED implies that each garbage collection is expensive, consuming much more than five seconds, perhaps an hour. Unlike our earlier method, where we kept most of the database on disk and exempt from immediate garbage collection, now we force everything into one homogeneous space, virtual memory. Unfortunately, each garbage collection will naturally have to consider ~all data in the database. In common to all the methods of garbage collection considered so far is that one must ~wait while garbage collection executes. While five- second garbage collection might be tolerable, larger garbage collections might not be. We now dive into the world of large virtual memory garbage collection methods. These methods are considerably less efficient overall than our segregated method, but they have some advantages as well as disadvantages. The advantage of a straightforward use of virtual memory over our earlier, core/disk segregated method is twofold. First, the programmer needn't be aware of that segregation, as he or she has no longer to specify where ~fracture can occur (for swapping purposes). Also, ~sharing is never lost now. The segregated method lost sharing ~between disk-resident structures (Section 23.9), although it preserves sharing within a single disk structure and among entire disk structures. To ameliorate the now huge garbage collection task, we burst it into many little pieces. Rather than waiting a long time during an entire GC, we make GC operate ~incrementally. A little bit of garbage collection occurs at each moment during the user program's execution. This method of garbage collection works with variable-sized blocks. 36.1 ~New SpaceTo begin to see how garbage collection can occur in parallel with user program execution, consider figure 36.1. We invent a new space, from which new blocks of memory are allocated while the old space is being garbage collected. This allows the user program to execute, and naturally acquire new blocks, even while garbage collection occurs in old space. 36.2 Evacuating From ~Old To ~Copy Space
Old space is garbage collected by copying all reachable blocks in old space into yet another space, ~copy space (figure 36.2(a)). Upon completion of this garbage collection, old space will be empty while all reachable non-new blocks will reside in copy space. A procedure like MARK, called the ~scavenger, runs in parallel with the user program. The scavenger copies each encountered block that is presently in old space. The scavenger performs as follows, depending ~where an encountered block resides: 1) If the given block is in ~old space... a) If the old block has no ~forwarding ~pointer... i) Copy that block into copy space (and mark it). ii) Leave a ~forwarding ~pointer in the old block, which points to its new residence. iii) Modify the bin containing the old pointer to hold now the new pointer into copy space. iv) Recurse for each pointer in the block. b) If the old block already has a forwarding pointer... i) Modify the bin containing the old pointer to now hold the new (forwarding) pointer (into copy space). 2) If the given block is in ~copy space... a) If it's marked, do nothing b) If it's not marked... i) Mark it. ii) Recurse for each pointer in the block 3) If the block is in ~new space, do nothing. When the scavenger is done, we know that all blocks in copy space reference no block in old space: Each bin containing a pointer into old space has been replaced by a newer pointer into copy space. Since the scavenger recurses upon itself for all pointers in any block, all reachable blocks have been moved from old to copy space. We are almost ready to reclaim the entirety of old space, as nothing in it is referenced. This would be completely true, except that some block in ~new space might point to a block in old space. The new block could have been created ~before the scavenger copied that old block into copy space. To assure that new space, like copy space, references no block in old space, we attach another responsibility upon the allocation of new blocks. Whenever a new block is created, and has been filled with data, all pointers in the block are examined immediately. Each pointer to a block in old space forces the immediate copying of that old block into copy space. Thus, the creation of a new block might take a little longer as all its pointers into old space must be replaced by pointers into copy space. The copying of referenced blocks from old to copy space is part of the block allocation cost. Thus, it is always true that blocks in new space ~never point to blocks in old space. Given this, and the fact that when the scaventer is done, no block in copy space points to old space, we rendered ~all blocks in old space as unreferenced. 36.3 The Flip When no block in old space is referenced, i.e., when the scavenger is done, we have the situation shown in figure 36.2(b). Old space is empty, and all blocks in use reside in copy or new space. At this point we ~flip. New and copy space together are now declared to be old space (figure 36.2(c)). The free space that used to be old space is now reused, as the new new and copy spaces. We are now left with the situation we started with. All blocks are in old space, and new and copy spaces are empty. Now, as before, we initiate the scavenger to copy all reachable blocks from their present residence in old space into copy space. Also, new blocks requested by the running user program come out of new space. BOX: From which space are blocks allocated to the running BOX: program? BOX: BOX: The scavenger does what? BOX: BOX: Why is old space completely empty at some time? BOX: BOX: What is the flip? 36.4 The Cost Figure 36.2 is drawn to scale if we assume that about half of old space is garbage (is unreachable). We've drawn copy space half as large as old space. Copy space must be large enough to hold all reachable blocks in old space. How big must new space be? It must contain enough blocks to feed the user program during the entirety of a garbage collection. The user program may eat new blocks at a maximum rate: Conceivably, it could eat one block in the time it takes to copy one block from old to copy space. Such a rate of consumption requires new space to be as large as copy space, as shown in the figure. Just after a flip, new and copy space, which consume about half of memory, are empty. Let's continue to assume that half of old space is garbage. The amount of memory consumed by blocks in use is thus about one quarter of memory. Our earlier method of variable-sized block memory management didn't require an empty copy and new space at any one time. It got by with just one space the size of old space. Thus, our earlier method required only half as much memory as this incremental method. However, the incremental method spreads the cost of garbage collection evenly over time, without pauses for stand-alone garbage collection. Again, the need for incremental garbage collection is great when we use virtual memory (not real memory) to house all blocks. Greater efficiency is afforded by our core/disk segregation method, where all garbage collections are limited to real (core) memory. Only very rarely were garbage collections of the disk required. The immediate, in-core garbage collections were independent of the size of the database, which is ~not true with this homogeneous method of relying on virtual memory to hold the entire database. 36.5 Static Space The copying of some large blocks can render an unexpectedly large expense. For example, a huge block might be a bitmap for a color video screen. The repeated copying of that much memory becomes noticable. For long-lived blocks, like bit maps, another space can be invented, called ~static space. Blocks in static space are never garbage collected, except upon specific request for such. The blocks are all considered in use. Although they are not copied, the scavenger's trip thru all reachable blocks will necessarily follow all pointers from all blocks in static space. 36.6 Ephemeral Garbage Collection We encountered an example of ~ephemeral, or ~partial, garbage collection in Chapter 35. Ephemeral GC can be applied also to the homogeneous, virtual memory GC. The essence of ephemeral GC is the avoidance of traversing thru the whole database. Older and newer data are segregated into two spaces (the "older, kept files" and the "newer, replaced files"). The chasing of pointers is limited to the newer space, and hence the cost of ephemeral GC is somewhat independent of the size of the entire database. To actually reach all reachable newer blocks, old blocks that have been modified objectively must be taken as additional roots. (Objective modification in LISP occurs via its operators RPLACA and RPLACD, which are similar to ICL's @-operator). With piggy-back VM files, all objectively modified blocks came to rest in the written VM file. For the homogeneous, virtual memory method of GC, one must still note older blocks that have been objectively modified. In SYMBOLIC's LISP machine, the ~page(s) of virtual memory containing the modified old block are marked specially. All pointers from all such marked pages are now introduced as more ~roots from which GC (of the newer space) commences. Over time, ephemeral garbage collection becomes less and less effective. More roots are taken unconditionally, and the conservative guesses ~retain more and more garbage. However, ephemeral garbage collection usually performs better than expected. Recently generated garbage is almost always fully collected. This fact serves well the observed behavior of blocks. Most allocated blocks cease to be referenced relatively soon after their allocations. ~Much garbage is in fact recent, and so will be freed by ephemeral garbage collection. BOX: How much extra memory is used by incremental garbage BOX: collection relative to that used in Chapter 34's BOX: variable-sized memory management. BOX: BOX: How big must new space be? BOX: BOX: Why have a static space? BOX: BOX: Why is incremental garbage collection required when BOX: relying on virtual memory to hold the whole database? BOX: BOX: Which is more efficient overall, the disk/core BOX: segregated method or the incremental garbage collection BOX: method?