Memory Exhaustion

PLEASE NOTE: I'M SAVING IN A HALF BAKED STATE, I'LL BE BACK AFTER X-MAS DUTIES. NEED TO FLESH OUT SOME MORE OF THE THOUGHTS/FACTS AND POSSIBLY BREAK INTO MORE THAN ONE NODE.

Memory Exhaustion occurs when a given process or task has run out of virtual memory. The OperatingSystem will not or can not provide any more. It is only a symptom, the cause and remedies are more interesting.

Generally, memory keeps growing due to dynamic allocation on the Heap. Some other dynamic allocation occurs on the stack for each call frame that holds the temporary variables, passed parameters, return values, and other information needed to return to the caller. Usually each thread has its own independent stack and different mechanisms are used to set the default limit or each individual limit. Generally speaking, adjusting the stack space limits is only necessary with deep call graphs (more than 20 or heavy recursion).

Causes

The causes usually fall into two main areas: programmatic or operational.

There are subcategories as well; programmatic bugs can be application level bugs, vendor library bugs, or system bugs (OS and system related tools).

Application level bugs are more prominent in imperative languages that have explicit dynamic memory management. Other languages and runtime systems make different tradeoffs that make them less susceptible to these sorts of problems, but their SystemProgramming implementations can have certain instances of the problems themselves, even though they may seem largely self-bootstrapped.

Programming/Design Faults

Memory Leaks

Memory Leaks can easily occur in languages that allow for dynamic allocation to be handled explicitly by the programmer.

C and C++ are prototypical examples. In C, each malloc(), calloc(), or realloc() call must have an associated free() call executed exactly once. In C++ each new must have either a delete or delete[] depending on whether a base object or an array was allocated. If any execution sequence doesn't adhere to these rules, unused memory, not referenced by an of the programs variables (ad infinitum for a program's recursive variables in objects or structures), could still be left allocated. Overtime, the problem could grow till the point where MemoryExhaustion occurs.

A frequent execution path that is mishandled is error handling and exceptions. In C++, the use of "initialization is acquisition" technique for any resource (filehandles, memory, etc.) is recommended. Even with STL, auto_ptr() only gets you so far without using various smart pointers from outside the Standard Library (i.e. BOOST).

Garbage Collection allows a program to have dynamic allocation, however the programmer is alleviated of the concern of deallocation, though there are some runtime costs that are somewhat higher than explicit management. Simple ReferenceCounting can leak cyclical referential variables (artifical inflation); introducing MemoryPools can help some (as long as there is an acceptable clean-up stage that can be demarked), but aren't a general solution. More sophisticated techniques such as MarkAndSweep and GenerationalGC require more language and system support, though there are some hacks that can non-optimally work without having a purist approach (i.e. Boehm GC for C and C++ (no direct language support)). Regardless, there can be some SystemProgramming bugs in the GC implementation that can lead to leaks as well.

Garabage Collecting isn't a total dynamic memory panacea. For implementing Caches, there is a need for WeakReferences or WeakVariables. Not using WeakReferences or WeakVariables can lead to unacceptable/unanticipated cache growth, which can lead to MemoryExhaustion as well.

The AntiPattern movement has documented some BitterJava cases where references and their lifetimes become too wasteful of memory. A good number of problems can be solved automatically, but not every memory issue.

Tools

There are two flavors, static syntax analysis and runtime analysis. Static syntax analysis can be considered Lint on steriods for languages like C/C++. Runtime analysis uses different methods to instrument or inject (replacement of standard library allocation routines via dynamic link library pre-loading before the standard libraries) the executable to do bookkeeping on where allocated which are leaked originate. The instrumented approaches are a little more invasive, but typically offer more information about other nasty C/C++ memory misdeads (array overwrite, reading from uninitialized memory, double free, double delete, etc) which lead to unspecificed behavior. On any large project, these tools can become invaluable to track down show stopping problems.

In addition to Garabage Collection that uses ad-hoc techniques for C/C++ (Boehm type GC), some commercial products offer more automatic correct for other types of C/C++ memory sins. I have no experience running this type of product and what their associated runtime overhead is; I just heard a sales pitch.

Trying to find MemoryLeaks and other C/C++ memory sins without tools can be equated to looking for a needle in a haystack. Pay for the tools if necessary, otherwise move a more congenial language and programming environment that makes intelligent tradeoffs that still work for your application(s).

Operational Faults

Most modern OperatingSystems either use swap partitions or swap files to supplement the amount of physical memory (RAM) that a machine holds, thus there is a global finite limit for all processes on a host. In addition certain administrative limits or quotas on machine resources can be made; the exact interface and the granularity of control differ across OperatingSystems.

In addition certain operating systems, may place restrictions on stack size where function/method calls place temporary variables, passed parameters, and return values in a stack frame.

Various virtual machines have different mechanisms to set "internal memory" limits, which is a form of memory exhaustion as well.

Multiple Uses of Swap Space

If a swap file is configured, filing up all available space on the enclosing file system and disk can cause shortages. Clearing up some space on the existing drive, can solve your immediate problems. Let's hope that something isn't going to fill it up quickly again. Sometimes one problem is the consequence of another that isn't as quite as obvious as the first. Installing more disk can help, unless it adds to the existing drives namespace's capacity (i.e. through RAID or hierarchical filesystem mountpoints that can happen in a relatively transparent fashion) some configuration changes of existing applications maybe necessary to take advantage of the new storage. Various types of changes introduce certain types of risks, make sure an appropriate level of change control exists for the different types of computing resources that you manage. There is a big difference between your home desktop system and running a 24/7 99.9999% data center where SLA (service level agreements) may have severe financial implications laid out in the contract.

On certain Unix variants, a swap partition can be shared with a temporary filesystem. A temporary filesystem is typically mounted underneath /tmp and various applications and users can write files there. Having a dedicated swap partition maybe an option to consider.

Available Global Virtual Memory / Too Low a Quota

The application is using legitamate amounts of memory, though they may have recently crossed a threshold over time. If you have historical information about memory usage (i.e. process accounting), a radical jump should have an explanation. A minor increase could just be a variation in the operational profile of the entire system, increase the quota either soft or hard or user specific according to your OperatingSystem guidelines.

Legitimate Memory Use / Not Enough Virtual Memory

There are three basic ways to solve the problem.

  1. Increase/Add Swap Files and/or Partitions
  2. Add more Memory to the machine, if possible
  3. Partition the applications to more than one machine, if possible

Increase / Add Swap Files and/or Partitions

Only a practical measure when the current Swap partitions or files aggregate total is less 2-3 times physical memory.

If there is no other alternative adding more swap may not dig you out of a hole.

Having too much virtual memory (swap) and not enough physical memory can lead to Thrashing, which is a terrible afflication of slowness on the machine, which can ultimately render the machine useless.

Add more Memory to the machine, if possible

I remember adding more than 512Mb to a Win98 machine, afterwards it wouldn't boot. Going to NT or 2000 was the only way to place the extra memory in the slots to make it work. Fortunately most modern Operating Systems behave better, but there are plenty of esoteric issues that may limit exactly what sort of extra memory you can slap into a box.

  1. Make sure that you've got the right type of memory
  2. Securely fit memory into their slots, without forcing them
  3. Make sure that mixing and matching different sized memory chips is acceptable
  4. Even if the machine and OS like the extra memory, making an application accept more than 2Gb or 4Gb requires a recompilation of binaries to make them 64-bit mode.
  5. The previous comment applies even if you just use an application. In that case, get ready to license the 64-bit mode version of the application.

In addition to OperatingSystem limits, the MemoryArchitecture (flat or segmented) and CPU/MMU can determine certain limits as well. In a flat address 32-bit mode, the virtual memory for a process is limited to 4Gb (232 == 4 * 230) and sometimes to 2Gb because of certain limitations on the "sign" bit. 64-bit mode, OperatingSystems, CPU/MMU, and architectures can theoretically offer an unprecedented amount (Exabytes) however, the number is limited in practice by how many memory modules can fit in the backplanes and the motherboard. The first 64-bit desktop computers (Athlon 4 and Apple's G-5) seem to be limited to 8GB, which is only a factor of 2 or 4 over fully-decked out 32-bit computers. Video-Editing and CAD programs with their massive amounts of data will be the primary beneficiaries on the desktop. The current day "Big Boys" like to use 64-bit mode and systems that support much more memory (some can hold TeraBytes); they are of primary interest to large scale databases, including those DataWarehouses and DataMining instances. Databases will greatly enjoy the increased BufferCache, though there may need to be special mechanisms to lock the Virtual Memory system's pages in memory to prevent paging. Regardless databases typically require tuning of different OS parameters for ones of considerable size. Memory's latency and throughput still beat the hell out of any SecondaryStorage by several OrdersOfMagnitude, but so does its cost.

Partition the applications to more than one machine, if possible

If you have one application that exceeds the largest amount possible on your current machine, I hope that you can afford to buy a new box that is compatible with the old that allows for much more. Get ready to spend some money.

Partitioning a non-partitioned application can take a fair amount of software engineering. It isn't trivial depending on the beast. Network latencies and throughput can affect how one decides to dice the tomato. GridComputing and ClusterComputing have their own parlance and methods to address many of the complicated issues.

MemoryExhaustion (last edited 2008-07-09 05:47:44 by localhost)