Bayle Shanks's website: proj-oot-old-150618-ootMemoryManagementNotes1

Python has reference counting, which frees most memory immediately as soon as it can be, but which never frees reference cycles. To catch the reference cycles, they have garbage collection on top of this. for some reason that they sort of explain but not really that i dont understand, having to do with their preexisting extension API, they cannot use a traditional garbage collector, which marks objects reachable from the root, but instead they just search for UNreachable objects, i dont understand this part.

Lua has an incremental garbage collector.

Apparently writing external modules in Python requires unwanted interaction with Python's reference counting/garbage collections (http://lua-users.org/lists/lua-l/2007-11/msg00248.html : " Compared to other language's binding mechanisms, Lua's C API effectively provides a shield between the internals of the Lua VM and the outer world. One does not have to care about internal structures of the Lua VM or even the garbage collector (how many refcounting bugs do your Python bindings have?).")

I like the idea of combining reference counting with an incremental garbage collector. Not sure how to do it. Should think about whether and how to gain Lua's lack of need for extensions to interact with refcounting, and whether and how to avoid whatever design decision Python used that prevented them from using traditional GC.

http://arctrix.com/nas/python/gc/

upon suggestion of an incremental GC for Lua, this guy argued for a Python-esque approach: http://lua-users.org/lists/lua-l/2003-07/msg00359.html so did this guy; http://lua-users.org/lists/lua-l/2003-05/msg00444.html

and later provides more arguments (many of which are not applicable to a non-copying incremental GC): http://lua-users.org/lists/lua-l/2003-07/msg00370.html

this post notes that inferno/limbo uses a Python-esque system: http://lua-users.org/lists/lua-l/2003-05/msg00496.html

this guy suggests that if refcounting is used, as an optimization objects on the stack are omitted: http://lua-users.org/lists/lua-l/2003-07/msg00359.html . This guy likes it but notes that it makes finalizer semantics more complex: http://lua-users.org/lists/lua-l/2003-07/msg00370.html . This post suggests that if that optimization is used, there is a language API that allows one to manually trigger garbage collection (and hence finalization) on objects on the stack: http://lua-users.org/lists/lua-l/2003-07/msg00381.html

http://stackoverflow.com/questions/4484167/details-how-python-garbage-collection-works

this post argues against a Python-esque model and for a pure incremental garbage collector, although it also seems to say that something like reference counting should be an optimization detail, and the only important issue is how much overhead the incremental garbage collector has in the write barrier:

http://lua-users.org/lists/lua-l/2003-08/msg00020.html

"I am willing to believe reference counting would relieve pressure on the M&S algorithm, but we are talking about a generational optimisation for Lua so this should become less of a problem. I think the issue here is whether the write barrier implementation for the generational incremental GC is a bigger overhead than just using RC."

this guys also argues against it: http://lua-users.org/lists/lua-l/2003-08/msg00088.html he notes among other things that naive reference counting, like stop-the-world garbage collectors, have unbounded pause times. he notes that incremental garbage collectors can have bounded pause times. he talks about read barrier and write barrier-based GCs.

here's an interesting suggestion: http://lua-users.org/lists/lua-l/2003-08/msg00101.html

i would extend the previous suggestion to say that, like disabling interrupts, there could be a language API to allow Oot code to tell the language to stop automatic garbage collections, and to only run the collector on demand.

excellent glossary:

http://www.iecc.com/gclist/GC-algorithms.html

good intro that also discusses what sort of annoying constraints the various GCs place on the rest of your language implementation:

http://llvm.org/docs/GarbageCollection.html

this blog post likes Lua's GC better than Python's, likes that you can turn automatic GC off, and implicitly suggests that refcounting+GC might make the GC part even faster:

http://www.altdevblogaday.com/2011/06/23/from-python-to-lua/

on the complexity of the don't-reference-count-on-the-stack optimization: http://lua-users.org/lists/lua-l/2003-05/msg00446.html

http://lua-users.org/lists/lua-l/2003-05/msg00447.html

random technical points: http://lua-users.org/lists/lua-l/2003-05/msg00498.html http://lua-users.org/lists/lua-l/2003-05/msg00483.html

some random links in this post: http://lua-users.org/lists/lua-l/2003-05/msg00465.html

here's Go's garbage collector: http://stackoverflow.com/questions/7823725/what-kind-of-garbage-collection-does-go-use

it's stop the world and zero-cost, so i dont think we want that one. it also doesn't support weak refs.

points out that incremental reference counting is also complicated: http://lua-users.org/lists/lua-l/2003-05/msg00458.html --

could allow programmer to assign integer 'finalizer levels' to objects. a cycle of objects with finalizers will be garbage collected from least to most iff their 'finalizer levels' are all distinct.

upvote

jerf 2 days ago

link

It's easier to start from a GC'ed language, and use escape hatches when you need them, then to start without GC and then realize that your program has become a mess because you don't have it.

Many GC'ed languages have such escape hatches. Those that don't, well, don't start writing performance-critical code in them would be my recommendation "

" 2) Pointers everywhere. Pointers, pointers and yet more pointers, and more than that still. So datastructures in java will never match their equivalent in C++ in lookup speeds. Plus, in C++ you can do intrusive datastructures (not pretty, but works), which really wipe the floor with Java's structures. If you intend to store objects with lots of subobjects, this will bit you. As this wasn't bad enough java objects feel the need to store metadata, whereas C++ objects pretty much are what you declared them to be (the overhead comes from malloc, not from the language), unless you declared virtual member functions, in which case there's one pointer in there. In Java, it may (Sadly) be worth it to not have one object contain another, but rather copy all fields from the contained object into the parent object. You lose the benefits of typing (esp. since using an interface for this will eliminate your gains), but it does accelerate things by keeping both things together in memory. "

so should be able to directly embed a subobject in a parent object, without an extra pointer. also should be able to do things like intrusive linked lists

what is the metadata that the above quote (from https://news.ycombinator.com/item?id=6425412 ) says that Java needs to store with each object? do we need to do that?

For serving systems, I care more about machine efficiency and latency for than about development time. I'm still skeptical of those properties in Go. Obviously the compiler (and my comfort with it) will improve over time. So will the garbage collector, to a point. It will always operate on one large heap like Java's. People say Java's GC works fine, but they spend a lot of time tuning it and then quote pause times that don't thrill me. I've seen a lot of problems caused by bad tuning parameters, like (for example) Java heap sizes that exceed the size of the container it's running in. And relatively hard-to-diagnose and slow-to-recover GC death spirals because the defaults suck: Java just bogs down in situations where a C++ binary would dump a heap sample, crash, and get restarted. So I'd be more comfortable with a system like Rust's: many smaller heaps, language enforcement of ownership rules when transferring between its equivalent of goroutines. "

interesting that Apple chose automatic reference counting for Objective-C on iOS ( http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/ ) despite Apple's preference for speed in iOS ( https://news.ycombinator.com/item?id=6016628 ), although garbage collection is supposedly better for throughput.

otoh http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/ seems to think that garbage collection is only at a huge disadvantage in very memory-constrained situations, in the future, perhaps CPUs will speed up less than memory will become cheaper, leading to less memory-constrained situations?

did i ever note this? garbage collection for young generations and reference counting for old ones:

http://grothoff.org/christian/teaching/2007/4705/urc-oopsla-2003.pdf

https://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Reference_counting

notes that often ref counts can be omitted, e.g. when a pointer is moved (e.g. upon a function call), or when one object is part of a larger object (i think, mb i got that wrong)

so far i am leaning towards something like Inferno's thing:

Then I found this quote from Ken Thompson (http://www.computer.org/computer/thompson.htm), and it crystalized my little rant about GC:

Thompson, talking about the Limbo language in Inferno:

    In addition, the language implementation -- and again I don't want
    to take any credit -- doesn't have big mark-and-sweep type garbage
    collection. It has reference counting: If you open something and
    then return, it's gone by reference count. Thus, you don't have
    high and low watermarks because 99 percent of the garbage goes
    away as soon as it is dereferenced. If you store a null in a
    pointer, it chases the pointer and all that stuff goes.

    If you make cycles, there is a background distributed
    mark-and-sweep algorithm that just colors the next step a little
    bit at a time. It doesn't have to be very aggressive because there
    is not much garbage around in most applications: People really
    don't leave dangling loop structures that require this kind of
    algorithm. So you can devote just one percent of your time in the
    background looking for garbage without these monster
    mark-and-sweep things.

    So, again, it's pragmatic. It's not the theoretical
    top-of-the-line garbage collection paper. It's just a way of doing
    it that seems to be very, very effective.

end Thompson quote " -- http://tulrich.com/geekstuff/ref-counting-is-good.txt

also note that i want to defer reference chasing so that if one free causes a cascade, there isn't a long pause

also, probably want a non-moving GC if any so that you can give pointers to external functions via FFI without pinning

maybe in addition to a 'turn off background GC' command, have a 'do this part fast' boundary type, e.g. sort of like turning off interrupts

another idea with destructor cycles is not to guarantee that the object will be destructed immediately after the destructor call (but only in the case of a destructor cycle)

-- the new Python system for dealing with finalizers and cyclic reference loops looks good:

http://www.python.org/dev/peps/pep-0442/

---

toread

http://www.memorymanagement.org/articles/lang.html

rust is cool:

" The ~ sigil represents a unique handle for a memory allocation on the heap:

{ an integer allocated on the heap let y = ~10; } the destructor frees the heap memory as soon as `y` goes out of scope

Rust includes syntax for heap memory allocation in the language since it's commonly used, but the same semantics can be implemented by a type with a custom destructor. 8 Ownership

Rust formalizes the concept of object ownership to delegate management of an object's lifetime to either a variable or a task-local garbage collector. An object's owner is responsible for managing the lifetime of the object by calling the destructor, and the owner determines whether the object is mutable.

Ownership is recursive, so mutability is inherited recursively and a destructor destroys the contained tree of owned objects. Variables are top-level owners and destroy the contained object when they go out of scope. A box managed by the garbage collector starts a new ownership tree, and the destructor is called when it is collected.

the struct owns the objects contained in the `x` and `y` fields struct Foo { x: int, y: ~int }

{ `a` is the owner of the struct, and thus the owner of the struct's fields let a = Foo { x: 5, y: ~10 }; } when `a` goes out of scope, the destructor for the `~int` in the struct's field is called

`b` is mutable, and the mutability is inherited by the objects it owns let mut b = Foo { x: 5, y: ~10 }; b.x = 10;

If an object doesn't contain garbage-collected boxes, it consists of a single ownership tree and is given the Owned trait which allows it to be sent between tasks. Custom destructors can only be implemented directly on types that are Owned, but garbage-collected boxes can still contain types with custom destructors. 9 Boxes

Many modern languages represent values as pointers to heap memory by default. In contrast, Rust, like C and C++, represents such types directly. Another way to say this is that aggregate data in Rust are unboxed. This means that if you let x = Point { x: 1f, y: 1f };, you are creating a struct on the stack. If you then copy it into a data structure, you copy the entire struct, not just a pointer.

For small structs like Point, this is usually more efficient than allocating memory and indirecting through a pointer. But for big structs, or mutable state, it can be useful to have a single copy on the stack or on the heap, and refer to that through a pointer. 9.1 Owned boxes

An owned box (~) is a uniquely owned allocation on the heap. It inherits the mutability and lifetime of the owner as it would if there was no box:

let x = 5; immutable let mut y = 5; mutable y += 2;

let x = ~5; immutable let mut y = ~5; mutable

y += 2; the * operator is needed to access the contained value

The purpose of an owned box is to add a layer of indirection in order to create recursive data structures or cheaply pass around an object larger than a pointer. Since an owned box has a unique owner, it can only be used to represent a tree data structure.

The following struct won't compile, because the lack of indirection would mean it has an infinite size:

struct Foo { child: Option<Foo> }

    Note: The Option type is an enum that represents an optional value. It's comparable to a nullable pointer in many other languages, but stores the contained value unboxed.

Adding indirection with an owned pointer allocates the child outside of the struct on the heap, which makes it a finite size and won't result in a compile-time error:

struct Foo { child: Option<~Foo> }

9.2 Managed boxes

A managed box (@) is a heap allocation with the lifetime managed by a task-local garbage collector. It will be destroyed at some point after there are no references left to the box, no later than the end of the task. Managed boxes lack an owner, so they start a new ownership tree and don't inherit mutability. They do own the contained object, and mutability is defined by the type of the managed box (@ or @mut). An object containing a managed box is not Owned, and can't be sent between tasks.

let a = @5; immutable

let mut b = @5; mutable variable, immutable box b = @10;

let c = @mut 5; immutable variable, mutable box

c = 10;

let mut d = @mut 5; mutable variable, mutable box

d += 5; d = @mut 15;

A mutable variable and an immutable variable can refer to the same box, given that their types are compatible. Mutability of a box is a property of its type, however, so for example a mutable handle to an immutable box cannot be assigned a reference to a mutable box.

let a = @1; immutable box let b = @mut 2; mutable box

let mut c : @int; declare a variable with type managed immutable int let mut d : @mut int; and one of type managed mutable int

c = a; box type is the same, okay d = b; box type is the same, okay

but b cannot be assigned to c, or a to d c = b; error

10 Move semantics

Rust uses a shallow copy for parameter passing, assignment and returning values from functions. A shallow copy is considered a move of ownership if the ownership tree of the copied value includes an owned box or a type with a custom destructor. After a value has been moved, it can no longer be used from the source location and will not be destroyed there.

let x = ~5; let y = x.clone(); y is a newly allocated box let z = x; no new memory allocated, x can no longer be used

Since in owned boxes mutability is a property of the owner, not the box, mutable boxes may become immutable when they are moved, and vice-versa.

let r = ~13; let mut s = r; box becomes mutable

s += 1; let t = s; box becomes immutable

11 Borrowed pointers

Rust's borrowed pointers are a general purpose reference type. In contrast with owned boxes, where the holder of an owned box is the owner of the pointed-to memory, borrowed pointers never imply ownership. A pointer can be borrowed to any object, and the compiler verifies that it cannot outlive the lifetime of the object.

As an example, consider a simple struct type, Point:

struct Point { x: float, y: float }

We can use this simple definition to allocate points in many different ways. For example, in this code, each of these three local variables contains a point, but allocated in a different location:

let on_the_stack : Point = Point { x: 3.0, y: 4.0 }; let managed_box : @Point = @Point { x: 5.0, y: 1.0 }; let owned_box : ~Point = ~Point { x: 7.0, y: 9.0 };

Suppose we want to write a procedure that computes the distance between any two points, no matter where they are stored. For example, we might like to compute the distance between on_the_stack and managed_box, or between managed_box and owned_box. One option is to define a function that takes two arguments of type point—that is, it takes the points by value. But this will cause the points to be copied when we call the function. For points, this is probably not so bad, but often copies are expensive. So we’d like to define a function that takes the points by pointer. We can use borrowed pointers to do this:

fn compute_distance(p1: &Point, p2: &Point) -> float { let x_d = p1.x - p2.x; let y_d = p1.y - p2.y; sqrt(x_d * x_d + y_d * y_d) }

Now we can call compute_distance() in various ways:

compute_distance(&on_the_stack, managed_box); compute_distance(managed_box, owned_box);

Here the & operator is used to take the address of the variable on_the_stack; this is because on_the_stack has the type Point (that is, a struct value) and we have to take its address to get a value. We also call this borrowing the local variable on_the_stack, because we are creating an alias: that is, another route to the same data.

In the case of the boxes managed_box and owned_box, however, no explicit action is necessary. The compiler will automatically convert a box like @point or ~point to a borrowed pointer like &point. This is another form of borrowing; in this case, the contents of the managed/owned box are being lent out.

Whenever a value is borrowed, there are some limitations on what you can do with the original. For example, if the contents of a variable have been lent out, you cannot send that variable to another task, nor will you be permitted to take actions that might cause the borrowed value to be freed or to change its type. This rule should make intuitive sense: you must wait for a borrowed value to be returned (that is, for the borrowed pointer to go out of scope) before you can make full use of it again.

For a more in-depth explanation of borrowed pointers, read the borrowed pointer tutorial. 11.1 Freezing

Borrowing an immutable pointer to an object freezes it and prevents mutation. Owned objects have freezing enforced statically at compile-time.

let mut x = 5; { let y = &x; x is now frozen, it cannot be modified } x is now unfrozen again

Mutable managed boxes handle freezing dynamically when any of their contents are borrowed, and the task will fail if an attempt to modify them is made while they are frozen:

let x = @mut 5; let y = x; { let z = &*y; the managed box is now frozen modifying it through x or y will cause a task failure } the box is now unfrozen again

-- http://static.rust-lang.org/doc/0.8/tutorial.html

---

some discusson on nimrod's GC:

https://news.ycombinator.com/item?id=6820474

http://pivotallabs.com/why-not-to-use-arc/

https://docs.python.org/3/whatsnew/3.4.html#pep-442-safe-object-finalization

---

" Since only one task can own a boxed array at a time, if instead of distributing our numbers array to a single task we wanted to distribute it to many tasks, we would need to copy the array for each. Let's see an example that uses the clone method to create copies of the data:

fn main() { let numbers = vec![1i, 2i, 3i];

    for num in range(0u, 3) {
        let (tx, rx)  = channel();
        // Use `clone` to send a *copy* of the array
        tx.send(numbers.clone());

        spawn(proc() {
            let numbers = rx.recv();
            println!("{:d}", numbers[num as uint]);
        })
    }}

This is similar to the code we had before, except now we loop three times, making three tasks, and cloning numbers before sending it.

However, if we're making a lot of tasks, or if our data is very large, creating a copy for each task requires a lot of work and a lot of extra memory for little benefit. In practice, we might not want to do this because of the cost. Enter Arc, an atomically reference counted box ("A.R.C." == "atomically reference counted"). Arc is the most common way to share data between tasks. Here's some code:

use std::sync::Arc;

fn main() { let numbers = vec![1i, 2i, 3i]; let numbers = Arc::new(numbers);

    for num in range(0u, 3) {
        let (tx, rx)  = channel();
        tx.send(numbers.clone());

        spawn(proc() {
            let numbers = rx.recv();
            println!("{:d}", (*numbers)[num as uint]);
        })
    }}

This is almost exactly the same, except that this time numbers is first put into an Arc. Arc::new creates the Arc, .clone() makes another Arc that refers to the same contents. So we clone the Arc for each task, send that clone down the channel, and then use it to print out a number. Now instead of copying an entire array to send it to our multiple tasks we are just copying a pointer (the Arc) and sharing the array.

How can this work though? Surely if we're sharing data then can't we cause data races if one task writes to the array while others read?

Well, Rust is super-smart and will only let you put data into an Arc that is provably safe to share. In this case, it's safe to share the array as long as it's immutable, i.e. many tasks may read the data in parallel as long as none can write. So for this type and many others Arc will only give you an immutable view of the data. " -- http://doc.rust-lang.org/master/intro.html

---

Go 1.4+ GC Plan and Roadmap

https://docs.google.com/document/d/16Y4IsnNRCN43Mx0NZc5YXZLovrHvvLhK_h0KN8woTO4/edit

https://news.ycombinator.com/item?id=8148666

---

random LtU? discussion on real-time GC:

http://lambda-the-ultimate.org/node/2393

---

how to concisely let user choose arc vs tracing gc?

---

Internal Memory Management

To speed-up memory allocation (and reuse) Python uses a number of lists for small objects. Each list will contain objects of similar size: there will be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc. When a small object needs to be created, either we reuse a free block in the list, or we allocate a new one.

There are some internal details on how Python manages those lists into blocks, pools, and “arena”: a number of block forms a pool, pools are gathered into arena, etc., but they’re not very relevant to the point we want to make (if you really want to know, read Evan Jones’ ideas on how to improve Python’s memory allocation). The important point is that those lists never shrink.

Indeed: if an item (of size x) is deallocated (freed by lack of reference) its location is not returned to Python’s global memory pool (and even less to the system), but merely marked as free and added to the free list of items of size x. The dead object’s location will be reused if another object of compatible size is needed. If there are no dead objects available, new ones are created.

If small objects memory is never freed, then the inescapable conclusion is that, like goldfishes, these small object lists only keep growing, never shrinking, and that the memory footprint of your application is dominated by the largest number of small objects allocated at any given point.

Therefore, one should work hard to allocate only the number of small objects necessary for one task, favoring (otherwise unpythonèsque) loops where only a small number of elements are created/processed rather than (more pythonèsque) patterns where lists are created using list generation syntax then processed.

While the second pattern is more à la Python, it is rather the worst case: you end up creating lots of small objects that will come populate the small object lists, and even once the list is dead, the dead objects (now all in the free lists) will still occupy a lot of memory.

The fact that the free lists grow does not seem like much of a problem because the memory it contains is still accessible to the Python program. But from the OS’s perspective, your program’s size is the total (maximum) memory allocated to Python. Since Python returns memory to the OS on the heap (that allocates other objects than small objects) only on Windows, if you run on Linux, you can only see the total memory used by your program increase. " -- http://deeplearning.net/software/theano/tutorial/python-memory-management.html

---

" Running an application with Metronome

Metronome is designed to provide RT behavior to existing applications. No user code modification should be required. Desired heap size and target utilization must be tuned to the application so target utilization maintains the desired application throughput while letting the GC keep up with allocation. Users should run their applications at the heaviest load they want to sustain to ensure RT characteristics are preserved and application throughput is sufficient. This article's Tuning Metronome section explains what you can do if throughput or utilization is insufficient. In certain situations, Metronome's short pause-time guarantees are insufficient for an application's RT characteristics. For these cases, you can use the RTSJ to avoid GC-incurred pause times. The Real-time Specification for Java

The RTSJ is a "specification for additions to the Java platform to enable Java programs to be used for real-time applications." Metronome must be aware of certain aspects of the RTSJ -- in particular, RealtimeThreads? (RT threads), NoHeapRealtimeThreads? (NHRTs), and immortal memory. RT threads are Java threads that, among other characteristics, run at a higher priority than regular Java threads. NHRTs are RT threads that can't contain references to heap objects. In other words, NHRT-accessible objects can't refer to objects subject to GC. In exchange for this compromise, the GC won't impede the scheduling of NHRTs, even during a GC cycle. This means NHRTs won't incur any pause times. Immortal memory provides a memory space that's not subject to GC; this means NHRTs are allowed to refer to immortal objects. These are only some aspects of the RTSJ; see Resources for a link to the complete specification. " -- http://www.ibm.com/developerworks/java/library/j-rtj4/index.html

http://www.jopdesign.com/doc/nbgc.pdf points out that there are two typically atomic parts of GC which lead to stop-the-world pauses; scanning the GC roots, and copying objects during heap compaction.

http://www.jopdesign.com/doc/nbgc.pdf points out that although GC roots must be scanned atomically, this is only per-thread, so at least one thread can be executing while another one is doing GC stuff (scanning its roots atomically).

http://www.ibm.com/developerworks/java/library/j-rtj4/index.html points out that a large part of GC pauses are due to atomic copying of large objects from one memory location to another during heap compaction. It solves this by having a max size for a contiguous block in memory; eg a large array is split into many discontiguous memory blocks. Now the length of the pause is bounded by the time it takes to copy just one of these guys.

another solution for pauses during heap compaction is write barriers; if you do the copies concurrently with ongoing execution, then when someone writes to the object being copied (it will write to the source, but not the destination, because you havent changed the pointer yet), you need to keep a log of the write, and after the copy is finished, to replay the writes on this log to the destination object before actually changing the pointer. Now you only need atomicity when you actually swap in the new pointer (question: couldn't there be multiple references to the same object in different places? don't you have to swap those all atomically to the new object? or do you use another layer of indirection ("handles")?

write barriers are also used in incremental or parallel garbage collection to prevent the mutators (the application program) from creating a new reference from an object that the GC has already traced to an existing argument mid-way through a GC cycle, so that the GC isn't aware of this new reference, in which case the GC might delete the destination object. (i dont really understand this though, because if the GC would have thought the object was unreachable, then how did the new reference to it get created at all?). http://www.ibm.com/developerworks/java/library/j-rtj4/index.html also talks about something called a "fuzzy barrier" that i don't understand that has to do with "objects being moved from one thread to another" (i understand this problem, but i don't understand what a fuzzy barrier is).

http://stackoverflow.com/questions/2663292/how-does-heap-compaction-work-quickly#comment4586534_2668486 suggests the indirection solution

https://en.wikipedia.org/wiki/Handle_%28computing%29 however says "Doubly indirect handles have fallen out of favour in recent times, as increases in available memory and improved virtual memory algorithms have made the use of the simpler pointer more attractive. "

) ---

what does C's malloc do? Actually it does arena memory management and has no heap compaction so fragmentation of the arena can increase arbitrarily, in the worst case causing memory consumption to increase arbitrarily: https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_native_memory_fragmentation_and_process_size_growth?lang=en

---

http://www.ibm.com/developerworks/java/library/j-rtj4/index.html also uses a 'cooperative suspend' in which, when the GC needs to stop-the-word (STW) (which is always for a bounded amount of time in this system), it notifies all threads, each of which then store any held object references into locations accessible from the roots, and then sleep until the GC signals that it's done

---

support read and write barriers; support associating some GC data (like reference counts) with each reference

---

https://news.ycombinator.com/item?id=9163125 suggests that you should be able to group eg an array of structs, or a tree of structs, or whatever, into a single 'allocation unit', to save a tracing garbage collector time

---