51

Java has an automatic GC that once in a while Stops The World, but takes care of garbage on a heap. Now C/C++ applications don't have these STW freezes, their memory usage doesn't grow infinitely either. How is this behavior achieved? How are the dead objects taken care of?

Robert Harvey
  • 200,592
Ju Shua
  • 703

8 Answers8

103

The programmer is responsible for ensuring that objects they created via new are deleted via delete. If an object is created, but not destroyed before the last pointer or reference to it goes out of scope, it falls through the cracks and becomes a Memory Leak.

Unfortunately for C, C++ and other languages which do not include a GC, this simply piles up over time. It can cause an application or the system to run out of memory and be unable to allocate new blocks of memory. At this point, the user must resort to ending the application so that the Operating System can reclaim that used memory.

As far as mitigating this problem, there are several things that make a programmer's life much easier. These are primarily supported by the nature of scope.

int main()
{
    int* variableThatIsAPointer = new int;
    int variableInt = 0;

    delete variableThatIsAPointer;
}

Here, we created two variables. They exist in Block Scope, as defined by the {} curly braces. When execution moves out of this scope, these objects will be automatically deleted. In this case, variableThatIsAPointer, as its name implies, is a pointer to an object in memory. When it goes out of scope, the pointer is deleted, but the object it points to remains. Here, we delete this object before it goes out of scope to ensure that there is no memory leak. However we could have also passed this pointer elsewhere and expected it to be deleted later on.

This nature of scope extends to classes:

class Foo
{
public:
    int bar; // Will be deleted when Foo is deleted
    int* otherBar; // Still need to call delete
}

Here, the same principle applies. We don't have to worry about bar when Foo is deleted. However for otherBar, only the pointer is deleted. If otherBar is the only valid pointer to whatever object it points to, we should probably delete it in Foo's destructor. This is the driving concept behind RAII

resource allocation (acquisition) is done during object creation (specifically initialization), by the constructor, while resource deallocation (release) is done during object destruction (specifically finalization), by the destructor. Thus the resource is guaranteed to be held between when initialization finishes and finalization starts (holding the resources is a class invariant), and to be held only when the object is alive. Thus if there are no object leaks, there are no resource leaks.

RAII is also the typical driving force behind Smart Pointers. In the C++ Standard Library, these are std::shared_ptr, std::unique_ptr, and std::weak_ptr; although I have seen and used other shared_ptr/weak_ptr implementations that follow the same concepts. For these, a reference counter tracks how many pointers there are to a given object, and automatically deletes the object once there are no more references to it.

Beyond that, it all comes down to proper practices and discipline for a programmer to ensure that their code handles objects properly.

jotik
  • 105
BlueBuddy
  • 666
82

C++ does not have garbage collection.

C++ applications are required to dispose of their own garbage.

C++ applications programmers are required to understand this.

When they forget, the result is called a "memory leak".

44

In C, C++ and other systems without a Garbage Collector, the developer is offered facilities by the language and its libraries to indicate when memory can be reclaimed.

The most basic facility is automatic storage. Many times, the language itself ensures that items are disposed of:

int global = 0; // automatic storage

int foo(int a, int b) {
    static int local = 1; // automatic storage

    int c = a + b; // automatic storage

    return c;
}

In this cases, the compiler is in charge of knowing when those values are unused and reclaim the storage associated with them.

When using dynamic storage, in C, memory is traditionally allocated with malloc and reclaimed with free. In C++, memory is traditionally allocated with new and reclaimed with delete.

C has not changed much over the years, however modern C++ eschews new and delete completely and relies instead on library facilities (which themselves use new and delete appropriately):

  • smart pointers are the most famous: std::unique_ptr and std::shared_ptr
  • but containers are much more widespread actually: std::string, std::vector, std::map, ... all internally manage dynamically allocated memory transparently

Speaking of shared_ptr, there is a risk: if a cycle of references is formed, and not broken, then memory leak there can be. It is up to the developer to avoid this situation, the simplest way being to avoid shared_ptr altogether and the second simplest being to avoid cycles at the type level.

As a result memory leaks are not an issue in C++, even for new users, as long as they refrain from using new, delete or std::shared_ptr. This is unlike C where a staunch discipline is necessary, and generally insufficient.


However, this answer would not be complete without mentioning the twin-sister of memory leaks: dangling pointers.

A dangling pointer (or dangling reference) is a hazard created by keeping a pointer or reference to an object that is dead. For example:

int main() {
    std::vector<int> vec;
    vec.push_back(1);     // vec: [1]

    int& a = vec.back();

    vec.pop_back();       // vec: [], "a" is now dangling

    std::cout << a << "\n";
}

Using a dangling pointer, or reference, is Undefined Behavior. In general, luckily, this is an immediate crash; quite often, unfortunately, this causes memory corruption first... and from time to time weird behavior crops up because the compiler emits really weird code.

Undefined Behavior is the biggest issue with C and C++ to this day, in terms of security/correctness of programs. You might want to check out Rust for a language with no Garbage Collector and no Undefined Behavior.

Matthieu M.
  • 15,214
27

C++ has this thing called RAII. Basically it means garbage gets cleaned up as you go rather than leave it in a pile and let the cleaner tidy up after you. (imagine me in my room watching the football - as I drink cans of beer and need new ones, the C++ way is to take the empty can to the bin on the way to the fridge, the C# way is to chuck it on the floor and wait for the maid to pick them up when she comes to do the cleaning).

Now it is possible to leak memory in C++, but to do so requires you leave the usual constructs and revert to the C way of doing things - allocating a block of memory and keeping track of where that block is without any language assistance. Some people forget this pointer and so cannot remove the block.

gbjbaanb
  • 48,749
  • 7
  • 106
  • 173
26

It should be noted that it is, in the case of C++, a common misconception that "you need to do manual memory management". In fact, you don't usually do any memory management in your code.

Fixed-size objects (with scope lifetime)

In the vast majority of cases when you need an object, the object will have a defined lifetime in your program and is created on the stack. This works for all built-in primitive data types, but also for instances of classes and structs:

class MyObject {
    public: int x;
};

int objTest()
{
    MyObject obj;
    obj.x = 5;
    return obj.x;
}

Stack objects are automatically removed when the function ends. In Java, objects are always created on the heap, and therefore have to be removed by some mechanism like garbage collection. This is a non-issue for stack objects.

Objects that manage dynamic data (with scope lifetime)

Using space on the stack works for objects of a fixed size. When you need a variable amount of space, such as an array, another approach is used: The list is encapsuled in a fixed-size object which manages the dynamic memory for you. This works because objects can have a special cleanup function, the destructor. It is guaranteed to be called when the object goes out of scope and does the opposite of the constructor:

class MyList {        
public:
    // a fixed-size pointer to the actual memory.
    int* listOfInts; 
    // constructor: get memory
    MyList(size_t numElements) { listOfInts = new int[numElements]; }
    // destructor: free memory
    ~MyList() { delete[] listOfInts; }
};

int listTest()
{
    MyList list(1024);
    list.listOfInts[200] = 5;
    return list.listOfInts[200];
    // When MyList goes off stack here, its destructor is called and frees the memory.
}

There is no memory management at all in the code where the memory is used. The only thing we need to make sure is that the object we wrote has a suitable destructor. No matter how we leave the scope of listTest, be it via an exception or simply by returning from it, the destructor ~MyList() will be called and we don't need to manage any memory.

(I think it is a funny design decision to use the binary NOT operator, ~, to indicate the destructor. When used on numbers, it inverts the bits; in analogy, here it indicates that what the constructor did is inverted.)

Basically all C++ objects which need dynamic memory use this encapsulation. It has been called RAII ("resource acquisition is initialization"), which is quite a weird way to express the simple idea that objects care about their own contents; what they acquire is theirs to clean up.

Polymorphic objects and lifetime beyond scope

Now, both of these cases were for memory which has a clearly defined lifetime: The lifetime is the same as the scope. If we do not want an object to expire when we leave the scope, there is a third mechanism which can manage memory for us: a smart pointer. Smart pointers are also used when you have instances of objects whose type varies at runtime, but which have a common interface or base class:

class MyDerivedObject : public MyObject {
    public: int y;
};
std::unique_ptr<MyObject> createObject()
{
    // actually creates an object of a derived class,
    // but the user doesn't need to know this.
    return std::make_unique<MyDerivedObject>();
}

int dynamicObjTest()
{
    std::unique_ptr<MyObject> obj = createObject();
    obj->x = 5;
    return obj->x;
    // At scope end, the unique_ptr automatically removes the object it contains,
    // calling its destructor if it has one.
}

There is another kind of smart pointer, std::shared_ptr, for sharing objects among several clients. They only delete their contained object when the last client goes out of scope, so they can be used in situations where it is completely unknown how many clients there will be and how long they will use the object.

In summary, we see that you don't really do any manual memory management. Everything is encapsulated and is then taken care of by means of completely automatical, scope-based memory management. In the cases where this is not enough, smart pointers are used which encapsulate raw memory.

It is considered extremely bad practice to use raw pointers as resource owners anywhere in C++ code, raw allocations outside of constructors, and raw delete calls outside of destructors, as they are almost impossible to manage when exceptions occur, and generally hard to use safely.

The best: this works for all types of resources

One of the biggest benefits of RAII is that it's not limited to memory. It actually provides a very natural way to manage resources such as files and sockets (opening/closing) and synchronization mechanisms such as mutexes (locking/unlocking). Basically, every resource that can be acquired and must be released is managed in exactly the same way in C++, and none of this management is left to the user. It is all encapsulated in classes which acquire in the constructor and release in the destructor.

For example, a function locking a mutex is usually written like this in C++:

void criticalSection() {
    std::scoped_lock lock(myMutex); // scoped_lock locks the mutex
    doSynchronizedStuff();
} // myMutex is released here automatically

Other languages make this much more complicated, by either requiring you to do this manually (e.g. in a finally clause) or they spawn specialized mechanisms which solve this problem, but not in a particularly elegant way (usually later in their life, when enough people have suffered from the shortcoming). Such mechanisms are try-with-resources in Java and the using statement in C#, both of which are approximations of C++'s RAII.

So, to sum it up, all of this was a very superficial account of RAII in C++, but I hope that it helps readers to understand that memory and even resource management in C++ is not usually "manual", but actually mostly automatic.

Felix Dombek
  • 2,129
8

With respect to C specifically, the language gives you no tools to manage dynamically-allocated memory. You are absolutely responsible for making sure every *alloc has a corresponding free somewhere.

Where things get really nasty is when a resource allocation fails midway through; do you try again, do you roll back and start over from the beginning, do you roll back and exit with an error, do you just bail outright and let the OS deal with it?

For example, here's a function to allocate a non-contiguous 2D array. The behavior here is that if an allocation failure occurs midway through the process, we roll everything back and return an error indication using a NULL pointer:

/**
 * Allocate space for an array of arrays; returns NULL
 * on error.
 */
int **newArr( size_t rows, size_t cols )
{
  int **arr = malloc( sizeof *arr * rows );
  size_t i;

  if ( arr ) // malloc returns NULL on failure
  {
    for ( i = 0; i < rows; i++ )
    {
      arr[i] = malloc( sizeof *arr[i] * cols );
      if ( !arr[i] )
      {
        /**
         * Whoopsie; we can't allocate any more memory for some reason.
         * We can't just return NULL at this point since we'll lose access
         * to the previously allocated memory, so we branch to some cleanup
         * code to undo the allocations made so far.  
         */
        goto cleanup;
      }
    }
  }
  goto done;

/**
 * We encountered a failure midway through memory allocation,
 * so we roll back all previous allocations and return NULL.
 */
cleanup:
  while ( i )         // this is why we didn't limit the scope of i to the for loop
    free( arr[--i] ); // delete previously allocated rows
  free( arr );        // delete arr object
  arr = NULL;

done:
  return arr;
}

This code is butt-ugly with those gotos, but, in absence any sort of a structured exception handling mechanism, this is pretty much the only way to deal with the problem without just bailing out completely, especially if your resource allocation code is nested more than one loop deep. This is one of the very few times where goto is actually an attractive option; otherwise you're using a bunch of flags and extra if statements.

You can make life easier on yourself by writing dedicated allocator/deallocator functions for each resource, something like

Foo *newFoo( void )
{
  Foo *foo = malloc( sizeof *foo );
  if ( foo )
  {
    foo->bar = newBar();
    if ( !foo->bar ) goto cleanupBar;
    foo->bletch = newBletch(); 
    if ( !foo->bletch ) goto cleanupBletch;
    ...
  }
  goto done;

cleanupBletch:
  deleteBar( foo->bar );
  // fall through to clean up the rest

cleanupBar:
  free( foo );
  foo = NULL;

done:
  return foo;
}

void deleteFoo( Foo *f )
{
  deleteBar( f->bar );
  deleteBletch( f->bletch );
  free( f );
}
John Bode
  • 11,004
  • 1
  • 33
  • 44
2

I've learned to classify memory issues into a number of different categories.

  • One time drips. Suppose a program leaks 100 bytes at startup time, only never to leak again. Chasing down and eliminating those one-time leaks is nice (I do like having a clean report by a leak detection capability) but is not essential. Sometimes there are bigger problems that need to be attacked.

  • Repeated leaks. A function that is called repetitively during the course of a programs lifespan that regularly leaks memory a big problem. These drips will torture the program, and possibly the OS, to death.

  • Mutual references. If objects A and B reference one another via shared pointers, you have to do something special, either in the design of those classes or in the code that implements/uses those classes to break the circularity. (This is not a problem for garbage collected languages.)

  • Remembering too much. This is the evil cousin of garbage / memory leaks. RAII will not help here, nor will garbage collection. This is a problem in any language. If some active variable has a pathway that connects it to some random chunk of memory, that random chunk of memory is not garbage. Making a program become forgetful so it can run for several days is tricky. Making a program that can run for several months (e.g., until the disk fails) is very, very tricky.

I have not had a serious problem with leaks for a long, long time. Using RAII in C++ very much helps address those drips and leaks. (One however does have to be careful with shared pointers.) Much more importantly I've had problems with applications whose memory use keeps on growing and growing and growing because of unsevered connections to memory that is no longer of any use.

David Hammen
  • 8,391
-6

It is up to the C++ programmer to implement his/her own form of garbage collection where necessary. Failure to do so will result in what is called a 'memory leak'. It is pretty common for 'high level' languages (such as Java) to have built in garbage collection, but 'low level' languages such as C and C++ do not.