The term “GC-hole” points to a special class of bugs that bedevils the CLR. The GC hole is a pernicious bug because it is easy to introduce by accident, repros rarely and is very tedious to debug. A single GC-hole can suck up weeks of dev and test time.
Lets discuss, What is GC-Hole? And How to create it?
First, some background. One of the major features of the CLR is the Garbage Collection system. That means that allocated objects, as seen by a managed application, are never freed explicitly by the programmer. Instead, the CLR periodically runs a Garbage Collector (GC). The GC discards objects that are no longer in use. Also, the GC compacts the heap to avoid unused holes in memory. Therefore, a managed object does not have a fixed address. Objects move around according to the whims of the garbage collector.
To do its job, the GC must be told about every reference to every GC object. The GC must know about every stack location, every register and every non-GC data structure that holds a pointer to a GC object. These external pointers are called “root references.”
Armed with this information, the GC can find all objects directly referenced from outside the GC heap. These objects may in turn, reference other objects – which in turn reference other objects and so on. By following these references, the GC finds all reachable (“live”) objects. All other objects are, by definition, unreachable and therefore discarded. After that, the GC may move the surviving objects to reduce memory fragmentation. If it does this, it must, of course, update all existing references to the moved object.
Any time a new object is allocated, a GC may occur. GC can also be explicitly requested by calling the GarbageCollect function directly. GC’s do not happen asynchronously outside these events but since other running threads can trigger GC’s, your thread must act as if GC’s are aynchronous unless you take specific steps to synchronize with the GC. More on that later.
So now, we can define a GC hole. A GC hole occurs when code inside the CLR creates a reference to a GC object, neglects to tell the GC about that reference, performs some operation that directly or indirectly triggers a GC, then tries to use the original reference. At this point, the reference points to garbage memory and the CLR will either read out a wrong value or corrupt whatever that reference is pointing to.
The code snippet below is the simplest way to introduce a GC hole into the system.
//OBJECTREF is a typedef for Object*.
PointerTable *pTBL = o_pObjectClass->GetPointerTable();
OBJECTREF aObj = AllocateObjectMemory(pTBL);
OBJECTREF bObj = AllocateObjectMemory(pTBL);
//WRONG!!! “aObj” may point to garbage if the second
//“AllocateObjectMemory” triggered a GC.
DoSomething (aOb, bObj);
All it does is allocate two managed objects, and then does something with them both.
This code compiles fine, and if you run simple pre-checkin tests, it will probably “work.” But this code will crash eventually.
Why? If the second call to “AllocateObjectMemory” triggers a GC, that GC discards the object instance you just assigned to “aObj”. This code, like all C++ code inside the CLR, is compiled by a non-managed compiler and the GC cannot know that “aObj” holds a root reference to an object you want kept live.
This point is worth repeating. The GC has no intrinsic knowledge of root references stored in local variables or non-GC data structures maintained by the CLR itself. You must explicitly tell the GC about them.