CHAPTER 36
INCREMENTAL GARBAGE COLLECTION
We consider here a straightforward use of ~virtual ~memory for large
databases. We simply believe that memory is large. As noted earlier,
a major disadvantage here is the large value of USED. A large value
for USED implies that each garbage collection is expensive,
consuming much more than five seconds, perhaps an hour.
Unlike our earlier method, where we kept most of the database on disk
and exempt from immediate garbage collection, now we force everything
into one homogeneous space, virtual memory. Unfortunately, each
garbage collection will naturally have to consider ~all data in the
database.
In common to all the methods of garbage collection considered so far
is that one must ~wait while garbage collection executes. While five-
second garbage collection might be tolerable, larger garbage
collections might not be.
We now dive into the world of large virtual memory garbage collection
methods. These methods are considerably less efficient overall than
our segregated method, but they have some advantages as well as
disadvantages.
The advantage of a straightforward use of virtual memory over our
earlier, core/disk segregated method is twofold. First, the
programmer needn't be aware of that segregation, as he or she has no
longer to specify where ~fracture can occur (for swapping purposes).
Also, ~sharing is never lost now. The segregated method lost sharing
~between disk-resident structures (Section 23.9), although it preserves
sharing within a single disk structure and among entire disk
structures.
To ameliorate the now huge garbage collection task, we burst it into
many little pieces. Rather than waiting a long time during an entire
GC, we make GC operate ~incrementally. A little bit of garbage
collection occurs at each moment during the user program's
execution.
This method of garbage collection works with variable-sized blocks.
36.1
~New Space
To begin to see how garbage collection can occur in parallel with
user program execution, consider figure 36.1. We invent a new space,
from which new blocks of memory are allocated while the old space is
being garbage collected. This allows the user program to execute, and
naturally acquire new blocks, even while garbage collection occurs in
old space.
36.2
Evacuating From ~Old To ~Copy Space
Old space is garbage collected by copying all reachable blocks in old
space into yet another space, ~copy space (figure 36.2(a)). Upon
completion of this garbage collection, old space will be empty while
all reachable non-new blocks will reside in copy space.
A procedure like MARK, called the ~scavenger, runs in parallel with the
user program. The scavenger copies each encountered block that is
presently in old space.
The scavenger performs as follows, depending ~where an encountered
block resides:
1) If the given block is in ~old space...
a) If the old block has no ~forwarding ~pointer...
i) Copy that block into copy space (and mark it).
ii) Leave a ~forwarding ~pointer in the old block,
which points to its new residence.
iii) Modify the bin containing the old pointer to
hold now the new pointer into copy space.
iv) Recurse for each pointer in the block.
b) If the old block already has a forwarding pointer...
i) Modify the bin containing the old pointer to
now hold the new (forwarding) pointer (into
copy space).
2) If the given block is in ~copy space...
a) If it's marked, do nothing
b) If it's not marked...
i) Mark it.
ii) Recurse for each pointer in the block
3) If the block is in ~new space, do nothing.
When the scavenger is done, we know that all blocks in copy space
reference no block in old space: Each bin containing a pointer into
old space has been replaced by a newer pointer into copy space. Since
the scavenger recurses upon itself for all pointers in any block, all
reachable blocks have been moved from old to copy space.
We are almost ready to reclaim the entirety of old space, as nothing
in it is referenced. This would be completely true, except that some
block in ~new space might point to a block in old space. The new
block could have been created ~before the scavenger copied that old
block into copy space.
To assure that new space, like copy space, references no block in old
space, we attach another responsibility upon the allocation of new
blocks. Whenever a new block is created, and has been filled with
data, all pointers in the block are examined immediately. Each
pointer to a block in old space forces the immediate copying of that
old block into copy space. Thus, the creation of a new block might
take a little longer as all its pointers into old space must be
replaced by pointers into copy space. The copying of referenced
blocks from old to copy space is part of the block allocation cost.
Thus, it is always true that blocks in new space ~never point to blocks
in old space. Given this, and the fact that when the scaventer is
done, no block in copy space points to old space, we rendered ~all
blocks in old space as unreferenced.
36.3
The Flip
When no block in old space is referenced, i.e., when the scavenger is
done, we have the situation shown in figure 36.2(b). Old space is
empty, and all blocks in use reside in copy or new space. At this
point we ~flip.
New and copy space together are now declared to be old space (figure
36.2(c)). The free space that used to be old space is now reused, as
the new new and copy spaces.
We are now left with the situation we started with. All blocks are in
old space, and new and copy spaces are empty. Now, as before, we
initiate the scavenger to copy all reachable blocks from their
present residence in old space into copy space. Also, new blocks
requested by the running user program come out of new space.
BOX: From which space are blocks allocated to the running
BOX: program?
BOX:
BOX: The scavenger does what?
BOX:
BOX: Why is old space completely empty at some time?
BOX:
BOX: What is the flip?
36.4
The Cost
Figure 36.2 is drawn to scale if we assume that about half of old space
is garbage (is unreachable). We've drawn copy space half as large as
old space. Copy space must be large enough to hold all reachable
blocks in old space.
How big must new space be? It must contain enough blocks to feed the
user program during the entirety of a garbage collection. The user
program may eat new blocks at a maximum rate: Conceivably, it could
eat one block in the time it takes to copy one block from old to
copy space. Such a rate of consumption requires new space to be as
large as copy space, as shown in the figure.
Just after a flip, new and copy space, which consume about half of
memory, are empty. Let's continue to assume that half of old space
is garbage. The amount of memory consumed by blocks in use is thus
about one quarter of memory.
Our earlier method of variable-sized block memory management didn't
require an empty copy and new space at any one time. It got by with
just one space the size of old space. Thus, our earlier method
required only half as much memory as this incremental method. However,
the incremental method spreads the cost of garbage collection evenly
over time, without pauses for stand-alone garbage collection.
Again, the need for incremental garbage collection is great when we
use virtual memory (not real memory) to house all blocks. Greater
efficiency is afforded by our core/disk segregation method,
where all garbage collections are limited to real (core) memory. Only
very rarely were garbage collections of the disk required.
The immediate, in-core garbage collections were independent
of the size of the database, which is ~not true with this
homogeneous method of relying on virtual memory to hold the entire
database.
36.5
Static Space
The copying of some large blocks can render an unexpectedly large
expense. For example, a huge block might be a bitmap for a color
video screen. The repeated copying of that much memory becomes
noticable.
For long-lived blocks, like bit maps, another space can be invented,
called ~static space. Blocks in static space are never garbage
collected, except upon specific request for such. The blocks are
all considered in use. Although they are not copied, the scavenger's
trip thru all reachable blocks will necessarily follow all pointers
from all blocks in static space.
36.6
Ephemeral Garbage Collection
We encountered an example of ~ephemeral, or ~partial, garbage
collection in Chapter 35. Ephemeral GC can be applied
also to the homogeneous, virtual memory GC.
The essence of ephemeral GC is the avoidance of traversing thru the
whole database. Older and newer data are segregated into two spaces
(the "older, kept files" and the "newer, replaced files"). The
chasing of pointers is limited to the newer space, and hence the cost
of ephemeral GC is somewhat independent of the size of the entire
database.
To actually reach all reachable newer blocks, old blocks that have been
modified objectively must be taken as additional roots. (Objective
modification in LISP occurs via its operators RPLACA and RPLACD,
which are similar to ICL's @-operator).
With piggy-back VM files, all objectively modified blocks came to rest
in the written VM file. For the homogeneous, virtual memory method of
GC, one must still note older blocks that have been objectively
modified. In SYMBOLIC's LISP machine, the ~page(s) of virtual memory
containing the modified old block are marked specially. All pointers
from all such marked pages are now introduced as more ~roots from which
GC (of the newer space) commences.
Over time, ephemeral garbage collection becomes less and less
effective. More roots are taken unconditionally, and the conservative
guesses ~retain more and more garbage.
However, ephemeral garbage collection usually performs better than
expected. Recently generated garbage is almost always fully
collected. This fact serves well the observed behavior of blocks.
Most allocated blocks cease to be referenced relatively soon after
their allocations. ~Much garbage is in fact recent, and so will
be freed by ephemeral garbage collection.
BOX: How much extra memory is used by incremental garbage
BOX: collection relative to that used in Chapter 34's
BOX: variable-sized memory management.
BOX:
BOX: How big must new space be?
BOX:
BOX: Why have a static space?
BOX:
BOX: Why is incremental garbage collection required when
BOX: relying on virtual memory to hold the whole database?
BOX:
BOX: Which is more efficient overall, the disk/core
BOX: segregated method or the incremental garbage collection
BOX: method?