[Mono-gc-list] Copying or Mark-Compact collector?
David Jeske
jeske@chat.net
Tue, 12 Aug 2003 13:01:27 -0700
On Tue, Aug 12, 2003 at 10:56:40AM +0200, Fernando Diaz wrote:
> My research about a Garbage collector for Mono is advancing. Now is
> the moment to choose between the classical algorithms.
I think we should be very careful about a common pitfall. Current
environments with GC almost all have unacceptable pauses for
interactive applications. This includes Java and MS.NET. Worse, since
CPU speed has increased far faster than memory speed, this situation
is getting worse not better.
http://mozart.chat.net/~jeske/Projects/PauseFreeGC/
One of the biggest problems with modern collectors is that their work
is proportional to heap size, so when the heap gets big, their
throughput degrades. Furthermore, they almost always do 'stop the
world' collection of something, and pause the application.
> I think that the better techniques are Copying and
> Mark-Compact. Since my point of view their behavior are very
> similar. What do you think about?.
IMO, Copying/Compacting collectors are 'not good'. The performance
overhead required to use handles instead of pointers (like the JVM) or
to track down and update all pointers, in order to facilitate moving
objects, is very undesirable. Copying/Compacting collectors are
solving a problem which does not need to be solved, namely allocation
time and fragmentation. Most C/C++ applications are not suffering from
either of these. If mono had performance similar to C++ I would be
thrilled.
I think the contenders for a low-pause collector are:
1) tracing/copying generational and incremental - see OCaml, Modula3
2) tracing generational and fully concurrent
3) hybrid reference counting plus stack tracing plus
fully concurrent generational cycle finding - see Python and research
I'm particularly fond of #3, and here is why:
While referencing counting might seems like "old technology", the most
popular automatic memory management systems are all reference
counted. This includes Visual Basic (<=VB6), COM, Perl, Python,
PHP4. Reference counting is very effective because (a) it does not
pause, and (b) non-cyclic collection is not proportional to heap-size.
Reference counting is often criticised for having expensive
write-barriers (i.e. incrementing and decrementing counters). However,
read operations are far more important that write operations, so these
same criticisms should bar using handle-based copying collectors which
slow down all reads through handle dereferencing.
Reference counting normally imposes the constraint that the programmer
not introduce cycles. This is annoying, but at least it is possible to
write programs which do not pause. In most modern GC systems, avoiding
pauses entirely is not possible. Reference counting can fallback to a
tracing GC style cycle-finding to clean up cyclic garbage. IMO, it is
easier to make this cycle-finding incremental and non-pausing because
it is the secondary collection scheme. Programmers who cannot affort
the background cycle finder can just turn it off and write their
programs without cycles.
Using a hybrid scheme which traces the stack and registers allows the
most common operations (i.e. handling data in local variables and
arguments) to occur without doing the work of reference counting.
----
However, whatever collection system we choose to implement, the goal
should be to allow the user to tradeoff between worst-case pause times
and memory usage. Currently there are many applications which simply
can't be written in Java or C# because the worst-case pause times are
unacceptable.
--
David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske@chat.net