CHAPTER 36


				INCREMENTAL GARBAGE COLLECTION



	  We consider here a straightforward use of ~virtual ~memory for large
	  databases.  We simply believe that memory is large.	 As noted earlier,
	  a major disadvantage here is the large value of USED.  A large value
	  for USED implies that each garbage collection is expensive,
	  consuming much more than five seconds, perhaps an hour.

	  Unlike our earlier method, where we kept most of the database on disk
	  and exempt from immediate garbage collection, now we force everything
	  into one homogeneous space, virtual memory.  Unfortunately, each
	  garbage collection will naturally have to consider ~all data in the
	  database.

	  In common to all the methods of garbage collection considered so far
	  is that one must ~wait while garbage collection executes.	 While five-
	  second garbage collection might be tolerable, larger garbage
	  collections might not be.

	  We now dive into the world of large virtual memory garbage collection
	  methods.	These methods are considerably less efficient overall than
	  our segregated method, but they have some advantages as well as
	  disadvantages.

	  The advantage of a straightforward use of virtual memory over our
	  earlier, core/disk segregated method is twofold.  First, the
	  programmer needn't be aware of that segregation, as he or she has no
	longer to specify where ~fracture can occur (for swapping purposes).
	Also, ~sharing is never lost now.  The segregated method lost sharing
	~between disk-resident structures (Section 23.9), although it preserves
	sharing within a single disk structure and among entire disk
	structures.

	To ameliorate the now huge garbage collection task, we burst it into
	many little pieces.  Rather than waiting a long time during an entire
	GC, we make GC operate ~incrementally.  A little bit of garbage
	collection occurs at each moment during the user program's
	  execution.

	  This method of garbage collection works with variable-sized blocks.


36.1
~New Space

	pic

	  To begin to see how garbage collection can occur in parallel with
	  user program execution, consider figure 36.1.	 We invent a new space,
	  from which new blocks of memory are allocated while the old space is
	  being garbage collected.  This allows the user program to execute, and
	  naturally acquire new blocks, even while garbage collection occurs in
	  old space.


36.2
Evacuating From ~Old To ~Copy Space

	pic

	  Old space is garbage collected by copying all reachable blocks in old
	  space into yet another space, ~copy space (figure 36.2(a)).  Upon
	  completion of this garbage collection, old space will be empty while
	  all reachable non-new blocks will reside in copy space.

	  A procedure like MARK, called the ~scavenger, runs in parallel with the
	  user program.  The scavenger copies each encountered block that is
	  presently in old space.

	  The scavenger performs as follows, depending ~where an encountered
	  block resides:

	     1)   If the given block is in ~old space...

			 a)	If the old block has no ~forwarding ~pointer...

				   i)	  Copy that block into copy space (and mark it).
				   ii)  Leave a ~forwarding ~pointer in the old block,
					  which points to its new residence.
				   iii) Modify the bin containing the old pointer to
					  hold now the new pointer into copy space.
				   iv)  Recurse for each pointer in the block.

			 b)	If the old block already has a forwarding pointer...

				   i)	  Modify the bin containing the old pointer to
					  now hold the new (forwarding) pointer (into
					  copy space).

	     2)   If the given block is in ~copy space...

			 a)	If it's marked, do nothing

		   b)	If it's not marked...

				   i)	  Mark it.
				   ii)  Recurse for each pointer in the block

	     3)   If the block is in ~new space, do nothing.

	  When the scavenger is done, we know that all blocks in copy space
	  reference no block in old space:	Each bin containing a pointer into
	  old space has been replaced by a newer pointer into copy space.	 Since
	  the scavenger recurses upon itself for all pointers in any block, all
	  reachable blocks have been moved from old to copy space.

	  We are almost ready to reclaim the entirety of old space, as nothing
	  in it is referenced.	This would be completely true, except that some
	  block in ~new space might point to a block in old space.	The new
	  block could have been created ~before the scavenger copied that old
	  block into copy space.

	  To assure that new space, like copy space, references no block in old
	  space, we attach another responsibility upon the allocation of new
	  blocks.  Whenever a new block is created, and has been filled with
	  data, all pointers in the block are examined immediately.	 Each
	  pointer to a block in old space forces the immediate copying of that
	  old block into copy space.	Thus, the creation of a new block might
	  take a little longer as all its pointers into old space must be
	  replaced by pointers into copy space.  The copying of referenced
	  blocks from old to copy space is part of the block allocation cost.

	  Thus, it is always true that blocks in new space ~never point to blocks
	  in old space.  Given this, and the fact that when the scaventer is
	  done, no block in copy space points to old space, we rendered ~all
	  blocks in old space as unreferenced.


36.3
The Flip

	  When no block in old space is referenced, i.e., when the scavenger is
	  done, we have the situation shown in figure 36.2(b).  Old space is
	  empty, and all blocks in use reside in copy or new space.	 At this
	  point we ~flip.

	  New and copy space together are now declared to be old space (figure
	  36.2(c)).	 The free space that used to be old space is now reused, as
	  the new new and copy spaces.

	  We are now left with the situation we started with.	 All blocks are in
	  old space, and new and copy spaces are empty.	 Now, as before, we
	  initiate the scavenger to copy all reachable blocks from their
	  present residence in old space into copy space.  Also, new blocks
	  requested by the running user program come out of new space.

		    BOX:	From which space are blocks allocated to the running
		    BOX:	program?
		    BOX:
		    BOX:	The scavenger does what?
		    BOX:
		    BOX:	Why is old space completely empty at some time?
		    BOX:
		    BOX:	What is the flip?


36.4
The Cost

	  Figure 36.2 is drawn to scale if we assume that about half of old space
	  is garbage (is unreachable).  We've drawn copy space half as large as
	old space.  Copy space must be large enough to hold all reachable
	blocks in old space.

	How big must new space be?  It must contain enough blocks to feed the
	user program during the entirety of a garbage collection.  The user
	program may eat new blocks at a maximum rate:  Conceivably, it could
	eat one block in the time it takes to copy one block from old to
	copy space.  Such a rate of consumption requires new space to be as
	large as copy space, as shown in the figure.

	Just after a flip, new and copy space, which consume about half of
	memory, are empty.  Let's continue to assume that half of old space
	  is garbage.  The amount of memory consumed by blocks in use is thus
	  about one quarter of memory.

	  Our earlier method of variable-sized block memory management didn't
	require an empty copy and new space at any one time.  It got by with
	just one space the size of old space.  Thus, our earlier method
	required only half as much memory as this incremental method.  However,
	the incremental method spreads the cost of garbage collection evenly
	over time, without pauses for stand-alone garbage collection.

	Again, the need for incremental garbage collection is great when we
	use virtual memory (not real memory) to house all blocks.  Greater
	efficiency is afforded by our core/disk segregation method,
	where all garbage collections are limited to real (core) memory.  Only
	very rarely were garbage collections of the disk required.
	The immediate, in-core garbage collections were independent
	of the size of the database, which is ~not true with this
	homogeneous method of relying on virtual memory to hold the entire
	database.


36.5
Static Space

	The copying of some large blocks can render an unexpectedly large
	expense.  For example, a huge block might be a bitmap for a color
	video screen.  The repeated copying of that much memory becomes
	noticable.

	For long-lived blocks, like bit maps, another space can be invented,
	called ~static space.  Blocks in static space are never garbage
	collected, except upon specific request for such.  The blocks are
	all considered in use.  Although they are not copied, the scavenger's
	  trip thru all reachable blocks will necessarily follow all pointers
	  from all blocks in static space.


36.6
Ephemeral Garbage Collection

	  We encountered an example of ~ephemeral, or ~partial, garbage
	  collection in Chapter 35.  Ephemeral GC can be applied
	  also to the homogeneous, virtual memory GC.

	  The essence of ephemeral GC is the avoidance of traversing thru the
	  whole database.	 Older and newer data are segregated into two spaces
	  (the "older, kept files" and the "newer, replaced files").  The
	  chasing of pointers is limited to the newer space, and hence the cost
	  of ephemeral GC is somewhat independent of the size of the entire
	  database.

	  To actually reach all reachable newer blocks, old blocks that have been
	  modified objectively must be taken as additional roots.  (Objective
	  modification in LISP occurs via its operators RPLACA and RPLACD,
	  which are similar to ICL's @-operator).

	With piggy-back VM files, all objectively modified blocks came to rest
	in the written VM file.  For the homogeneous, virtual memory method of
	GC, one must still note older blocks that have been objectively
	modified.  In SYMBOLIC's LISP machine, the ~page(s) of virtual memory
	  containing the modified old block are marked specially.  All pointers
	  from all such marked pages are now introduced as more ~roots from which
	  GC (of the newer space) commences.

	  Over time, ephemeral garbage collection becomes less and less
	  effective.  More roots are taken unconditionally, and the conservative
	  guesses ~retain more and more garbage.

	  However, ephemeral garbage collection usually performs better than
	  expected.	 Recently generated garbage is almost always fully
	  collected.  This fact serves well the observed behavior of blocks.
	  Most allocated blocks cease to be referenced relatively soon after
	  their allocations.  ~Much garbage is in fact recent, and so will
	  be freed by ephemeral garbage collection.

		BOX:	How much extra memory is used by incremental garbage
		BOX:	collection relative to that used in Chapter 34's
		BOX:	variable-sized memory management.
		BOX:
		BOX:	How big must new space be?
		BOX:
		BOX:	Why have a static space?
		BOX:
		BOX:	Why is incremental garbage collection required when
		BOX:	relying on virtual memory to hold the whole database?
		BOX:
		BOX:	Which is more efficient overall, the disk/core
		BOX:	segregated method or the incremental garbage collection
		BOX:	method?