Bayle Shanks's website: proj-oot-ootMemoryManagementNotes3

sources of GC latency in golang, from https://tip.golang.org/doc/gc-guide : " Brief stop-the-world pauses when the GC transitions between the mark and sweep phases, Scheduling delays because the GC takes 25% of CPU resources when in the mark phase, User goroutines assisting the GC in response to a high allocation rate, and Pointer writes requiring additional work while the GC is in the mark phase. Running goroutines must be suspended for their roots to be scanned. "

so, the relevant ones for us are:

Brief stop-the-world pauses when the GC transitions between the mark and sweep phases,
Pointer writes requiring additional work while the GC is in the mark phase.
Running goroutines must be suspended for their roots to be scanned.

Getting to Go: The Journey of Go's Garbage Collector mentions that:

Golang's GC is non-moving but (due to other allocation and virtual memory techniques?) they dont think fragmentation is a problem
(imo probably related to it being non-moving), Golang's GC has no read barrier
it is important to avoid an always-on write barrier

---

i dont think we'll do this but it is interesting, b/c that's not what i thought 'reference counting' was:

randyrand 26 days ago

parent

next [–]

I feel like the most user friendly solution for Use After Free is to just use Reference Counting. Basically, just copy Objective-C's "weak_ptr" design.

For every allocation, also store on the heap a "reference object", that keeps track of this new reference.

  struct reference_t {
   // the ptr returned by malloc
   void* ptr;
   // the stack/heap location that ptr was written to.
   // i.e  the reference location.
   void* referenceAddr; 
  }

Reference track: every time ptr is copied or overwritten, create and delete reference objects as needed.

If ptr is freed, visit every reference and call *referenceAddr = NULL, turning all of these references into NULL pointers.

---

maybe ask somewhere for advice:

Choosing a automatic memory management scheme for a high-level programming language.

I'm looking for a simple automatic memory management scheme with short pause times. Most programming languages that I see want performant memory management at the cost of being complex, but I care more about simplicity; except that I do care about having short pause times.

Reference counting or tracing garbage collection or both are ok (if there is a third option, that's okay too); but if reference counting, some sort of cycle detection should be in there too. The high-level language will have a runtime, so it's okay to require that the runtime do things like: - box values and store GC metadata with each box - have read barriers or write barriers or both - do reference counting, or some other computation whenever memory is accessed or changed

The developer using this high-level programming language should not have to think about memory management at all, so stuff like what Rust does is out of scope.

Criteria:

it should definitely have short pauses, be portable, and be concurrent
other criteria in order of importance: simple > low memory usage > fast
note: by "simple" i mean few lines of code ("LOC") when implemented in C
If your answer is "just use this library", the source code of the library counts towards the LOC
note: these priorities are relative, not absolute; for example, I'm willing to tolerate a little extra complexity and a little extra memory usage in exchange for a lot more speed
deterministic finalization is not necessary
i'm on the fence about compaction. I want the user to be able to run programs for a long time without running out of memory. Otoh my impression is that many highly-regarded contemporary languages, such as Go/Golang, have non-compacting GC, so maybe fragmentation is only a theoretical issue that doesn't come up in practice? Otoh maybe these languages only avoid fragmentation issues by doing complicated or non-portable things in their allocator, which I don't want to do.

My best guess right now is that I should use reference counting plus a cycle detector, and a write barrier that is on only when the cycle detector is confirming a cycle, but I could be wrong.

What do you recommend?

(later: well i think a 'cycle detector' would have to be a full tracing GC, like Python, b/c what if each node in every cycle participates in multiple cycles? It's not as simple as looking for nodes with refcount 1, following outgoing references from them, and finding a cycle)

shorter version:

Choosing a automatic memory management scheme for a programming language.

This will be a high-level programming language with a runtime. The programmer should not have to think about memory management much, if at all (therefore the programmer should not have to do Rust-style thinking about ownership). The runtime will know where pointers are and is allowed to do things like box values and store GC metadata there, have write barriers and/or read barriers, reference count, etc.

Criteria:

requirements: short pauses, portable, concurrent
other criteria in order of importance: simple > low memory usage > fast
note: by "simple" i mean few lines of code ("LOC") when implemented in C
If your answer is "just use this library", the source code of the library counts towards the LOC
note: these priorities are relative, not absolute; for example, I'm willing to tolerate a little extra complexity and a little extra memory usage in exchange for a lot more speed
reference cycle detection is probably necessary. I'm still considering not having reference cycle detection, and putting the burden on the programmer to use weak references to break cycles in some cases, but i'm leaning towards saying that this burden should be removed from the programmer and the language should do reference cycle detection.
deterministic finalization is not necessary
i'm on the fence about compaction. I want the user to be able to run programs for a long time without running out of memory. Otoh my impression is that many highly-regarded contemporary languages, such as Go/Golang and Python, have non-compacting GC, so maybe fragmentation is only a theoretical issue that doesn't come up in practice? Otoh maybe these languages only avoid fragmentation issues by doing complicated or non-portable things in their allocator?

What do you recommend?

---

I'm looking for recommendations for a simple automatic memory management scheme for a programming language. Reference-counting and/or tracing garbage collection are both options (I'm open to other suggestions too). I care about short pause times.

Unlike many programming language implementations that I see, which are performant at the cost of a complex implementation, I care most about simplicity of implementation, and am willing to accept poor performance (except my concern about pause times). This is all all relative; I will accept a little implementation complexity in exchange for a lot of performance. So the criteria are: short pause times > simple implementation > low memory usage > fast

I'm on the fence about compaction. I do want the user to be able to run programs for a long time without running out of memory. However, various contemporary language implementations, such as Golang (Go) and Python have non-compacting GCs and I don't hear a lot of complaints about people's programs crashing due to memory fragmentation. Is fragmentation only a theoretical issue that doesn't come up in practice -- or, do those implementations have to do something complicated to avoid fragmentation, something that I could avoid doing if I do compaction?

This will be a high-level programming language with a runtime. The programmer should not have to think about memory management much, if at all (the language should detect reference cycles). The runtime will know where pointers are and could do things like box values and store GC metadata there, have write barriers and/or read barriers, reference count, etc. The memory management scheme should be portable and concurrent.

What do you recommend?

---

"The M-Machine’s limitation of power-of-two-sized objects was a show stopper for a lot of real-world things (at least 25% memory overhead, close to 50% in some cases). " -- [1]

---

Title Hardware and Software Support for Managed-Language Workloads in Data Centers Permalink https://escholarship.org/uc/item/1qv7m47s Author Maas, Martin Christoph Publication Date 2017 Peer reviewed

Thesis/dissertation

https://escholarship.org/content/qt1qv7m47s/qt1qv7m47s.pdf 7.2 and 7.2.1

---

doug-moen 24 hours ago

link

flag

RC in C++ using std::shared_ptr is a different proposition than RC in a memory safe language that is designed for safe and seamless RC. In the latter case, the compiler knows about RC and can do a lot of tricks to minimize cost.

High performance GC uses a lot of tricks to achieve performance. If the same amount of work is put into optimizing RC, using an optimizing compiler that emits different kinds of refcount management code in different situations, then you can do far better than what’s possible using a smart pointer class in C++.

In C++ using std::shared_ptr, it’s true that RC is all done locally and you can’t stop it halfway. That isn’t true in a compiler supported and optimized RC runtime. You can use lazy RC, for example. Deferred RC is also nonlocal.

Furthermore, with the approach I’m describing, you can eliminate the need for atomic operations for RC update in multithreaded code. There are several ways to do lock-free RC.

For more information, the Wikipedia article is a start but there’s a lot more information about techniques in papers that you can find with a Google Scholar search.

https://en.wikipedia.org/wiki/Reference_counting https://scholar.google.com/scholar?&q=reference%20counting

---

"AFAIK, the best modern state of art garbage collectors have stop-the-world pauses, proportional to size of root set and/or thread count." -- [2]

"alloc/free..can be doing all sorts of book keeping behind the scenes, including cascading destructor calls which could free a chain of objects the size of live memory." [3]

"The Garbage Collection Handbook originally published in 1996 defines garbage collection as: "All garbage collection schemes are based on one of four fundamental approaches; mark-sweep collection, copying collection, mark-compact collection or reference counting."" -- [4]

--- real-time inplace tricolor marking as per https://dl.acm.org/doi/pdf/10.1145/130854.130862 (thanks davmac https://lobste.rs/s/qdtqi2/automatic_memory_management_with_short#c_4c4fde ) looks good

some potentially good stuff here:

https://www.iecc.com/gclist/GC-algorithms.html

Snapshot mark-and-sweep
Baker's real time copying collection (read barrier)

---

https://lobste.rs/s/qdtqi2/automatic_memory_management_with_short#c_68owhm

NOTE: updated version of this list, specific for this project, is at ootMemoryManagementThoughts.txt; if you want to add stuff, add it there

A summary of things I think I've learned so far (from the replies in the first 2 days)

everyone suggested a tracing garbage collector (GC) of some sort (one suggestion was for reference counting, with a tracing cycle detector done only when pauses can be tolerated)
most commentators feel that my criteria are too stringent, and that it would be better to give up on some of them for now, especially one or more of: portability, concurrency, short pauses, cycle-detection
if portability can be given up, then the Boehm-Dehmer-Weiser collector ( https://www.hboehm.info/gc/ ) can be used
if the language can get by without mutable data, then cycle-detection isn't necessary
if the language can get by without sharing garbage-collected memory between threads, then concurrency isn't necessary
i could also consider giving up only the intersection of some of these criteria; for example, i could have concurrency without cycle-detection, and have cycle-detection on memory which is only visible to one thread, but not detect cycles in shared memory
no one suggested a moving (copying or compacting) garbage collector
incremental tricolor mark-sweep tracing GCs were mentioned multiple times, so i should probably search the literature for those (some literature uses the term "real-time" rather than incremental; it's probably worth it for me to search the literature for both of those terms although i don't think they are synonymous)
some examples to learn from include:
https://github.com/robey/mwgc (incremental tricolor mark-sweep with write barrier)
https://dl.acm.org/doi/pdf/10.1145/130854.130862 ("treadmill"; incremental tricolor mark-sweep with read barrier)
the Micropython garbage collector
i'm guessing it's here: https://github.com/micropython/micropython/blob/master/py/gc.c . See also https://docs.micropython.org/en/latest/develop/memorymgt.html
Lua 5.1 (incremental tricolor mark-sweep; I don't know which barriers it has, if any) (this was not explicitly suggested by anyone but Lua came up in discussion so I looked up its GC and noticed it was incremental)
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.7791&rep=rep1&type=pdf (an example of taking advantage of additional invariants in the context of a specific application; in this paper, we assume that all pointers in the heap always point backwards towards "older" data)
a list of special things that the implementation may need to do in order to permit garbage collection includes:
produce a list of the locations of any pointers within data, including in:
the registers, stack, globals ("the roots")
the heap
read barriers and write barriers
interrupt execution every now and then for GC
override pointer comparisons (so that eg copying collectors can say that a fromspace pointer and a tospace pointer to the corresponding object are equivalent)
derived pointers maybe shouldn't be stored, not even in registers, b/c moving GCs may move the base pointer, and hence invalidate all derived pointers.
Instead, could use fat pointers that always store the base pointer next to an offset
still can't do pointer subtraction though
associate GC metadata (like color) with each pointer (can either be right next to, or in, the pointer (pointer tagging); or in an object header; or in a separate table of GC metadata)

NOTE: updated version of this list, specific for this project, is at ootMemoryManagementThoughts.txt; if you want to add stuff, add it there

---

ok after reading that i figured i should lookup concurrent tricolor garbage collectors. I googled for concurrent incremental tricolor barrier and found some interesting results. The most useful was probably this:

https://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a.html

which is a sort of journal club where the students read http://gchandbook.org/ and each presented a chapter or so to the class. The PDFs of the presentations are on this webpage.

There are chapters on

Runtime interface and language specific concerns: http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec6-runtime.pdf
Concurrent GC: http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec9-cgc.pdf
Concurrent mark-sweep GC: http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec10-mark-sweep-cgc.pdf http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec10-mark-sweep-cgc-sum.pdf
Concurrent copying and compaction GC: http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec11-concurrent-copy-compaction-gc.pdf http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec11-concurrent-copy-compaction-gc-sum.pdf
Concurrent reference-counting: http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec12-conc-rc.pdf http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec12-conc-rc-sum.pdf
Real-time GC : http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec13-real-time-gc.pdf http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec13-real-time-gc-sum.pdf

i note that the phrases "incremental", "concurrent", "real-time" and "on-the-fly" all seem to come up in paper discussing techniques relevant to me.

my notes on

Runtime interface and language specific concerns: http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec6-runtime.pdf

the following are quotes but i omitted ... when i omit things; also as usual i change punctuation and layout: " Where to look for pointers

Heap
Objects
Stack
Registers
Dynamic code

Accurate pointer finding using tagged values

Bit stealing
Big bags of pages
Object metadata is stored in object’s header
Bit vector
Vector of offsets of pointer fields
Can be changed dynamically to control order of pointer tracing
Partitioning pointer and non-pointer data
Compiler-generated methods for tracing objects

Accurate pointer finding in stacks Finding frames within the stack

Finding pointers within each frame
Stack maps
Dealing with calling conventions

Handling interior pointers Computing standard reference from interior pointer

Table with starting addresses of each object
Heap parsability
Big bags of pages Interior pointers with copying collectors
Update interior pointer appropriately
Pinning

Handling derived pointers Definition Derived pointer is a pointer that is derived from other pointers using arithmetic or logic expression

upperk(p) or lowerk(p)
p ± c, such that the result lies outside of p
p − q, distance between two objects In general case we must have access to the base expression from which the pointer was derived

Object tables Definition Object table - dense array of small records which describe objects

Instead of mutator holding a direct pointer to an object in heap - let it hold a unique identifier (handle) into object table
Object table must contain at least contains direct pointer to object
Handle is:
Index into object table
Easier to relocate the table
Requires calculating entry’s address
Pointer into table entry

Object tables advantages

Easy to find and scan pointers
Easy to move objects
Only the table entries must be scanned and updated
To simplify moving, each object should contain a (hidden) self-reference (back-pointer)
Additional metadata may be stored in object table, for example, class and size information

Object tables disadvantages

Performance hit
Two memory accesses instead of one (and optionally integer addition)
No support for interior or derived pointers

References from external (non-managed) code

Garbage collector must not free or move objects referenced from external code

the object must survive only for the duration of the call to external code, it’s enough to leave a live reference to the object on stack

Otherwise, external code must explicitly register an object it wishes to reference, and deregister when it’s done using it
Collector may treat registered objects as additional roots
Moving collectors can use handles (similar to object tables) instead of direct pointers
In cases where external code cannot use handles (OS code), collector may need to pin externally referenced objects

Stack barriers

Not safe to scan a frame in which a thread is running
Usually not acceptable to pause a thread long enough to scan its entire stack

It’s possible to scan a stack incrementally using stack barriers

Divert a thread by altering one of stored return addresses into a custom procedure Stack barriers can also mark portions of stack that have not changed
Stack barriers can handle dynamic code changes

GC-safe points

Most systems have short sequences of code that must be run entirely to preserve GC invariants

Initializing a new object
Setting up a new stack frame
Write barrier

Write barriers detect and record interesting pointers in remembered sets

Card tables Definition Card tables record written pointers with card’s size granularity. Implemented as dense array of cards (memory blocks) indexed by dense array of bytes (header), which mark dirty cards.

Typically small cards 128-512 bytes
Unlike hash tables or store buffers - can’t overflow
More work for the collector
Commonly used without filtering
The most compact header implementation - array of bits - not very common " (there are many more slides about card tables and crossing maps)

" Weak references with tracing collectors

First pass: weak references are only recorded but not traced
Only strongly reachable objects are found
Second pass: weak references are examined
If weak reference’s target was reached in first pass - weak reference is retained
Otherwise, it is set to null "

---

notes on Concurrent GC

tricolor intro (i'm not going to bother reiterating this, it's widely available eg wikipedia)

the mutator can be colored black or gray, too (to keep track of the status of the roots)

(sorta-quotations, as in the previous section)" Mutator color influences color of allocated memory

Must satisfy the invariant that applies given the mutator color
Grey mutator can allocate white objects
Black mutator can not allocate white objects

Allocating black objects is always safe

Tricolor invariant types

Strong invariant: No pointers from black objects to white objects
Weak invariant: Every white object pointed by a black object, is reachable from some grey object, directly or through a white chain

Barrier techniques To maintain one of the two invariants, the following actions can be taken at barriers:

Add to the wavefront by shading a white object grey
Advance the wavefront by scanning an object (to black)
Retreat the wavefront by reverting a black object to grey Any other action would break the invariants

Incremental update solutions

Address T1/D1 mutations (adding a white pointer to a black object)
Conservatively treat a white object inserted behind the wavefront as alive
Increase (increment) the set of objects known to be live
Use write barrier to re-color source/destination to prevent black to white pointers
Can use a read barrier to load references to a black mutator
Thus, preserve the strong invariant

Snapshot-at-the-beginning solutions

Address T2/D2 mutations (deletion of a white pointer from a white/grey object)
Conservatively treat any white object ahead of the wavefront as alive
Use a write barrier to protect against deletion
Must snapshot the mutator and operate only with black mutator
Maintain the weak invariant
No way to delete all paths from grey objects to any object that was alive at the beginning of the collection cycle

all possible barrier solutions:

Grey mutator incremental update (v1)

Steele [1975, 1976]
atomic Write(src, i, ref): src[i] ß ref if isBlack(src) if isWhite(ref) revert(src)

Maintains the strong invariant

No black to white pointers exist
Most precise barrier, at the cost of progress

Grey mutator incremental update (v2)

Boehm et al [1991]
atomic Write(src, i, ref): src[i] ß ref if isBlack(src) revert(src)

Less precise

Uses virtual memory dirty bits to record modifications
Stop-the-world to rescan dirty pages

Grey mutator incremental update (v3)

Dijkstra et al [1976, 1978]
atomic Write(src, i, ref): src[i] ß ref if isBlack(src) shade(ref)

(can this be replaced by a non-atomic version without the isBlack condition check? Surprisingly, apparently no)

Black mutator incremental update (v1)

Baker [1978]
atomic Read(src, i): ref ß src[i] if isGrey(src) ref ß shade(ref) return ref

Maintains the strong invariant (after snapshotting mutator)

No pointers from black to white
Supports a copying-collector

Black mutator incremental update (v2)

Appel et al [1988]
atomic Read(src, i): if isGrey(src) scan(src) return src[i]

Less precise (more coarse-grained) than before

Uses virtual memory page protection mechanisms to trap
Trapped instruction continues after scanning

Black mutator snapshot-at-the-beginning

Abraham and Patel [1987] and Yuasa [1990]
atomic Write(src, i, ref): if isGrey(src)

shade(src[i]) src[i] ß ref

isWhite(src)

Maintains the weak invariant

White objects that were alive at the beginning are grey protected
Less precise of all
Initially wasn’t conditioned (used virtual memory COW)

Black mutator hybrid barrier

Pirinen [1998]
atomic Read(src, i): if isWhite(src) shade(src[i]) return src[i]
atomic Write(src, i, ref): if isGrey(src) shade(src[i]) src[i] ß ref

Maintains the weak invariant

Actually every black to white pointer has a direct grey father

Completeness of barrier techniques

Pirinen [1998] claims that this is the complete list, other than possible short-circuiting/coarsening:
Scanning instead of shading
Scan origin in a deletion barrier rather than shading deleted
..
The given barrier techniques cover the minimum requirements to maintain their invariants "

Card tables: " Concurrent write barrier mechanisms

Write barriers must detect and record all grey objects
To be inspected (and traced) by the collector
Mutators and collectors will access such data concurrently
Must work correctly and efficiently
One way is to add these grey objects to a log
Which we explored before (Chapter 13; queues etc)
Another solution is using card tables "

---

my notes on http://www.cs.tau.ac.il/~maon/teaching/2014-2015/seminar/seminar1415a-lec12-conc-rc.pdf

naive concurrent reference counting must lock the object being accessed while incrementing or decrementing the reference
using CAS or similar (compareAndSet/atomic inc/dec) to reduce these costs is trickier than it seems (see slides)

most of these are quotes, some are not: " to have less locking: buffered reference counting

Let another thread do all of the accounting. Buffer all the ref. counting operations
Each mutator logs it’s operations using a write barrier
Each cycle:
Collect logs from each thread
Apply reference count updates
Reclaim memory
Time is divided into epochs.
Deferred ref. counting is used.
see slides for algorithm

Sliding View Reference Counting (Coalesced Reference Counting, and Sliding View):

Create a snapshot of the heap
Create a sliding view (i didn't read the rest of this part)

An Efficient On-the-Fly Cycle Collection A “Big algorithm” - incorporates many techniques I Mark & Sweep (Lecs. 2 & 10) I Generational GC (Lec. 5) I The Recycler (Lec. 3) I Coalesced reference counting I Sliding views Relatively new - Paz et al. 2007

Young generation I High mutation and death rates I Use1 Mark & Sweep - trace only live objects Old generation I Low mutation and death rates I Use1 Reference counting - handle big live heaps

The Recycler Proposed by Bacon & Rajan at 2001 Reclaim unreachable circles using trial deletion Algorithm: I Build a set of candidates I Decrement counts for each candidate (mark as gray) I Scan objects reachable from candidates with rc 6 = 0 F Those items are marked black F The rest are marked white I Remove white objects Many traversals are required - needs a “fixed” heap

---

my notes on Concurrent Mark-Sweep

Mostly concurrent: some pauses, all mutator threads must be stopped at once
Mark and Sweep concurrently: "So far, marking-sweeping proceeding in series...We may also trigger concurrent sweeping from the previous marking phase. Even if a gc cycle has been already running the next one"
on the fly: fully concurrent. "Sample the roots of each mutator thread separately, concurrently with other mutator threads working".

"Doligez-Leroy-Gonthier [1993]  Let’s separate data allocated solely on a single thread and not shared with other threads.  In addition we have a global heap that allows sharing of objects among threads.  Private objects may move to global heap, if needed. "

see slides for algorithms

(is this paper relevant? http://pauillac.inria.fr/~doligez/publications/doligez-gonthier-popl-1994.pdf ?)

clearer than that paper and those slides: https://justinethier.github.io/cyclone/docs/Garbage-Collector

looks pretty darn complicated. But still perhaps the simplest thing that is reasonable...

---

my notes on [5]

mostly quotes, as above: " Outlineu Concurrent Copying

Baker’s algorithm
Brook’s indirection barrier
Replication copying
Multi-version copying
Sapphire algorithm- basics u Concurrent Compaction
Compressor
Pauseless collector

compacting vs copying: copying you are copying stuff all the time. Compacting you are copying stuff only when you feel that you want to -- more freedom.

Concurrent copying Invariants mutator To-space invariant: mutator must, by definition, hold only To-space pointers. mutator From-space invariant: mutator must, by definition, hold only From-space pointers at the beginning of the collector cycle.

Termination all mutator threads must end the collection cycle holding only To-space pointers!

Baker’s read barrier Every object read by the mutator must be in the To-space! When the mutator tries to read through a pointer

atomic Read(src, 1): ref <— src[i] if ref != null && isGrey(src) ref <— forward(ref) return ref

forward(fromRef): toRef ← forwardingAddress(fromRef) if toRef = null toRef ← copy(fromRef) return toRef

Copy(fromRef): toRef ← free free ← free+size(fromRef) move(fromRef,toRef) forwardingAddress(fromRef) ← toRef add(worklist,toRef) return toRef

objects allocated in To-space are considered Black

expensive in time and space

Halstead's multiprocessor Baker-style algorithm (1985)

each processor gets its own fromspace and tospace
uses locking to handle races between processors that compete to copy the same object, and for updates to avoid writing to an object while it is being evaculated
uses global synchronization to have all the processors perform the flip into their tospace before discarding their fromspace

alterative:

Brook's indirection barrier Brook’s suggested a different approach : Instead of requiring the To-space invariant allow the mutator to make progress without concern for the wavefront. Brooks observed that if every object (whether in From-space or To-space) has a non-null forwarding pointer (either to its from-space original or to its copy in To- space) then the test on the src object in the read barrier can be eliminated!

atomic Read (src, i): src <-- forwardingAddress(src) return src[i] atomic Write(src,i,ref): src <-- forwardingAddress(src) if is Black(src) ref <-- forward(ref) src[i] <-- ref

Brook's indirection barrier –termination termination requires all mutators to hold only To-space references but Brooks allows mutators to operate grey! therefore, Once copying is finished all mutators need a final scan of their stacks to replace any remaining unforwarded references

pros:

avoids the need for Baker's To-space invariant which forces copying work when loading a from-space reference from the heap

cons:

indirection pointer adds (bounded) overhead to every mutator heap
pointer adds an additional pointer word to the header of every object "

more complicated ones: "

replication copying collectors (Nettles et al, 1992; Nettles and O'Toole, 1993)
uses mutation logs
still requires a global stop-the-world synchronization of the mutator threads to transition them to To-space
Herlihy and Moss multi-version copying (1992)
each processor region composed of one tospace and MANY fromspaces (today a "processor" is actually a thread, but we’ll still refer to it as order to be consistent with Halstead)
while copying is performed, multiple from-space versions of an object can accumulate in different spaces
only one of these versions is current while the rest are obsolete
Hudson and Moss (Sapphire) (2003)
builds on previous stuff but permits only one thread at a time to flip from From-space to To-space rather than all at once

Concurrent compaction One-pass compressor Kermany and Petrank, 2006

the Compressor changes pointers before the actual copying
Trap any mutator trying to access a protected To-space copy
we want from our trap?
Map and populate the page with its copies.
Forward the references in those copies

downside: trapping mutator means copying all of the accessed page objects and the pointers in those objects to refer to To-space locations -- Significant pauses

Click and Azul to the rescue! The Pauseless collector [Click et al, 2005; Azul, 2008]

Protect From-space pages that contain objects being moved
Use a read barrier whenever a mutator tries to read a stale From-space reference.
Until now we used a Stop-the-world technique, Pauseless uses a checkpoint:
When a mutator reaches the checkpoint, it can be asked to do some GC work before it could proceed.
The collector helps blocked threads by performing the GC work for them

The Pauseless collector is divided into 3 phases: Mark phase- responsible for periodically refreshing the mark bits. Relocate phase- uses the most recently available mark bits to find pages with little amount of live data, relocate their objects and free their physical memory Remap phase- updates every pointer in the heap whose target has been relocated. "

Summary "

some algorithms accomplish this with the To-space invariant (Baker)
others let the mutator use the To-space copy, if one exists, or otherwise use From-space objects (Brooks)
others let the mutator operate in the From-space, as long as all the changes it makes eventually propagate to the To-space (Nettles and O’Toole?)
algorithms which wanted to prevent stopping the world to redirect root pointers ended up creating multiple versions for each object (Herlihy and Moss multi-version collector)
compaction can be done in similar ways, but without the need to copy all objects at each collection (Compressor and Pauseless)
There's one common bad thing all copying algorithms we just saw share in common: They copy! "

---

https://justinethier.github.io/cyclone/docs/Garbage-Collector (and many other places) point out that if your language creates a lot of garbage quickly, a copying GC is better, at least for the young generation, because most of the memory is dead but it only copies the live memory

honestly Cyclone Scheme's writeup, https://justinethier.github.io/cyclone/docs/Garbage-Collector looks like almost the simplest thing that is reasonable.

the stack doubles as the nursery, and uses a copying collector b/c it's more efficient, and to prevent fragmentation. Are these stack nurserys are per-thread, and never pointed to by other threads?
the mature heap is traced via Doligez-Leroy-Gonthier (DLG) -- this is a non-moving collector, so it supports interop which may require non-moving objects

the tracing itself is just tricolor marking with a write barrier. There's some additional fanciness just to make sure the collector is sync'd with the mutators, without having to do a global pause of all mutators at once

the DLG GC is at https://github.com/justinethier/cyclone/blob/master/gc.c. looks pretty long... (later) actually i've been skimming through it, it's pretty straightforward

it uses http://concurrencykit.org/ btw

btw they also added http://justinethier.github.io/cyclone/docs/Garbage-Collection-Using-Lazy-Sweeping

i guess in our case we could skip the copying collector and just use DLG; also we dont need lazy sweeping

how PyPy? reduced their pauses with an incremental tricolor GC with a write barrier:

" This "major collection" is what gives the long GC pauses. To fix this problem we made the GC incremental: instead of running one complete major collection, we split its work into a variable number of pieces and run each piece after every minor collection for a while, until there are no more pieces. The pieces are each doing a fraction of marking, or a fraction of sweeping. It adds some few milliseconds after each of these minor collections, rather than requiring hundreds of milliseconds in one go.

The main issue is that splitting the major collections means that the main program is actually running between the pieces, and so it can change the pointers in the objects to point to other objects. This is not a problem for sweeping: dead objects will remain dead whatever the main program does. However, it is a problem for marking. Let us see why.

In terms of the incremental GC literature, objects are either "white", "gray" or "black". This is called tri-color marking. See for example this blog post about Rubinius, or this page about LuaJIT? or the wikipedia description. The objects start as "white" at the beginning of marking; become "gray" when they are found to be alive; and become "black" when they have been fully traversed. Marking proceeds by scanning grey objects for pointers to white objects. The white objects found are turned grey, and the grey objects scanned are turned black. When there are no more grey objects, the marking phase is complete: all remaining white objects are truly unreachable and can be freed (by the following sweeping phase).

In this model, the important part is that a black object can never point to a white object: if the latter remains white until the end, it will be freed, which is incorrect because the black object itself can still be reached. How do we ensure that the main program, running in the middle of marking, will not try to write a pointer to white object into a black object? This requires a "write barrier", i.e. a piece of code that runs every time we set a pointer into an object or array. This piece of code checks if some (hopefully rare) condition is met, and calls a function if that is the case.

The trick we used in PyPy? is to consider minor collections as part of the whole, rather than focus only on major collections. The existing minimark GC had always used a write barrier of its own to do its job, like any generational GC. This existing write barrier is used to detect when an old object (outside the nursery) is modified to point to a young object (inside the nursery), which is essential information for minor collections. Actually, although this was the goal, the actual write barrier code is simpler: it just records all old objects into which we write any pointer --- to a young or old object. As we found out over time, doing so is not actually slower, and might actually be a performance improvement: for example, if the main program does a lot of writes into the same old object, we don't need to check over and over again if the written pointer points to a young object or not. We just record the old object in some list the first time, and that's it.

The trick is that this unmodified write barrier works for incminimark too. Imagine that we are in the middle of the marking phase, running the main program. The write barrier will record all old objects that are being modified. Then at the next minor collection, all surviving young objects will be moved out of the nursery. At this point, as we're about to continue running the major collection's marking phase, we simply add to the list of pending gray objects all the objects that we just considered --- both the objects listed as "old objects that are being modified", and the objects that we just moved out of the nursery. A fraction from the former list were black object; so this mean that they are turned back from the black to the gray color. This technique implements nicely, if indirectly, what is called a "backward write barrier" in the literature. The backwardness is about the color that needs to be changed in the opposite of the usual direction "white -> gray -> black", thus making more work for the GC. (This is as opposed to "forward write barrier", where we would also detect "black -> white" writes but turn the white object gray.)

In summary, I realize that this description is less about how we turned minimark into incminimark, and more about how we differ from the standard way of making a GC incremental. What we really had to do to make incminimark was to write logic that says "if the major collection is in the middle of the marking phase, then add this object to the list of gray objects", and put it at a few places throughout minor collection. Then we simply split a major collection into increments, doing marking or sweeping of some (relatively arbitrary) number of objects before returning. That's why, after we found that the existing write barrier would do, it was not much actual work, and could be done without major changes. For example, not a single line from the JIT needed adaptation. All in all it was relatively painless work. ;-) "

-- [6]

---

https://www.google.com/search?q=cocurrent%20incremental%20tricolor%20barrier https://www.google.com/search?q=Baker%20Brooks%20Sapphire%20concurrent%20garbage%20collection

---

" > And of course, not all kinds of object lend themselves to garbage collection - for instance, file descriptors, since you can't guarantee when or if they'll ever close. So you have to build your own reference counting system on top of your garbage collected system.

This is not a typical solution.

Java threw finalizers into the mix and everyone overused them at first before they realized that finalizers suck. This is bad enough that, in response to "too many files open" in your Java program, you might invoke the GC. Other languages designed since then typically use some kind of scoped system for closing file descriptors. This includes C# and Go.

Garbage collection does not need to be used to collect all objects. " -- https://news.ycombinator.com/item?id=32282512

---

" origin_path 10 days ago

root

parent

next [–]

The best real world garbage collectors are like ZGC, Shenandoah or C4 which don't pause your program at all. They are fully concurrent.

Even if you go for a GC that is designed to balance throughput and latency, like G1, you can still configure what pause times it should target and those can easily be ~10-20msec if you want, with the vast majority of pauses being far less than that (less than 3 msec).

"I was recently reading some blog posts of the V8 JS engine team. There is nothing to understand or not understand there"

Look, if your knowledge of GC comes from reading V8 blog posts then you're proving the grandparent's point pretty nicely. You leaped from an extremely basic and remote understanding of GC to "there's nothing to understand or not understand here".

ahartmetz 10 days ago

root

parent

next [–]

Serious questions: if these garbage collectors are so good, why aren't they widely used? My guess is that they have downsides such as low throughput, high CPU usage, high memory usage, etc... You can't import a JVM GC into V8, to stay with my example, but you can reimplement the ideas if you have quasi-infinite money like Google.

origin_path 10 days ago

root

parent

next [–]

For ZGC/Shenandoah because they're new, and they're new because they're extremely hard to implement well. For C4 because it is expensive and requires kernel patches. Also there isn't a whole lot of need for them in many use cases. Web servers for example have far bigger latency problems than GC, normally. Pauseless GC was historically driven by the HFT/finance sector for that reason.

Also yes, pauseless GC tends to have higher overheads than GC that pauses for longer. Whether that matters or not depends a lot on the use cases and actual size of the overheads. For example ZGC is pauseless but not yet generational. Generational ZGC when it launches will improve throughput significantly.

Google haven't done pauseless GC for V8. I don't know why not because they have done one for Android. ART uses a fully concurrent collector iirc. At any rate, although you can't import a JVM GC to V8, you can run JavaScript? on the JVM using GraalJS? and use the GCs that way. Though I don't recall off hand if Graal supports ZGC yet. There's no deep reason why it couldn't.

reply "

---

" ART GC overview

ART has a few different GC plans that consist of running different garbage collectors. Starting with Android 8 (Oreo), the default plan is Concurrent Copying (CC). The other GC plan is Concurrent Mark Sweep (CMS).

Some of the main characterstics of Concurrent Copying GC are:

    CC enables use of a bump-pointer allocator called RegionTLAB. This allocates a thread-local allocation buffer (TLAB) to each app thread, which can then allocate objects out of its TLAB by bumping the "top" pointer, without any synchronization.
    CC performs heap defragmentation by concurrently copying objects without pausing app threads. This is achieved with the help of a read-barrier which intercepts reference reads from the heap, without the need of any intervention from the app developer.
    GC only has one small pause, which is constant in time with regards to the heap size.
    CC extends to be a generational GC in Android 10 and higher. It enables collecting young objects, which often become unreachable fairly quickly, with little effort. This helps by increasing GC throughput and considerably delaying the need to perform a full-heap GC.

The other GC that ART still supports is CMS. This GC also supports compaction, but not concurrently. Compaction is avoided until the app goes into the background, at which time the app threads are paused to perform compaction. Compaction also becomes necessary when an object allocation fails due to fragmentation. In this case the app may potentially become unresponsive for some time.

Since CMS rarely compacts, and thus free objects may not be contiguous, it uses a free-list-based allocator called RosAlloc?. It has a higher allocation cost as compared to RegionTLAB?. Finally, because of internal fragmentation the memory usage for the Java heap can be higher for CMS than CC. "

---

Waterluvian 10 days ago

parent

next [–]

I’m out of my league so this may be dumb, but does any language or VM or whatnot have a combined system where each thread has its own heap, and you can talk by passing messages, but they also have a common heap for larger data that’s too expensive to pass around, but at the cost that you have to be much more careful with lifetimes or have to manage it manually or something?

tkhattra 10 days ago

root

parent

next [–]

In the Erlang VM, each Erlang "process" has its own garbage collected heap [1]. There's also an ETS module for more efficiently storing and sharing large amounts of data, but data stored in ETS is not automatically garbage collected [2].

[1] https://www.erlang.org/doc/apps/erts/garbagecollection [2] https://www.erlang.org/doc/man/ets.html

jefftk 10 days ago

root

parent

next [–]

If you squint at it right you could say C works that way: you have many processes each with their own heap and they can pass messages, but if they want larger data that's too expensive to pass around you can use shared memory.

kaba0 10 days ago

root

parent

next [–]

That’s just IPC, nothing inherent to C.

jefftk 10 days ago

root

parent

next [–]

Sure, you can do this in a lot of languages. I gave C mainly because it's the "native language" of Unix.

Jweb_Guru 10 days ago

root

parent

next [–]

I believe Nim works this way, among others.

jeffmurphy 10 days ago

root

parent

next [–]

I believe Erlang works this way.

Bolkan 10 days ago

root

parent

next [–]

Dart

---

ridiculous_fish 10 days ago

next [–]

There's a lot of discussion of comparative performance, but most software isn't performance sensitive so it just doesn't matter. But there's another major facet: FFIs. The choice of memory management has huge implications for how you structure your FFI.

JavaScriptCore? uses a conservative GC: the C stack is scanned, and any word which points at a heap object will act as a root. v8 is different, it uses a moving collector: references to heap objects are held behind a double-redirection so the GC may move them. Both collectors are highly tuned and extremely fast, but their FFIs look very different because of their choice of memory management.

Read and write barriers also come into play. If your GC strategy requires that reads/writes go through a barrier, then this affects your FFI. This is part of what sunk Apple's ObjC? GC effort: there was just a lot of C/C++ code which manipulated references which was subtly broken under GC; the "rules" for the FFI became overbearing.

Java's JNI also illustrates this. See the restrictions around e.g. GetPrimitiveArrayCritical?. It's hard to know if you're doing the right thing, especially bugs may only manifest if the GC runs which it might not in your test.

One of the under-appreciated virtues of RC is the interoperability ease. I know std::sort only rearranges, doesn't add or remove references, so I can just call it. But if my host language has a GC then std::sort may mess up the card marking and cause a live object to be prematurely collected; but it's hard to know for sure!

chubot 10 days ago

parent

next [–]

I agree that the API and interop with C/C++ is a huge issue and something I haven't seen good articles on.

But I was sort of put off from reference counting by working with Python extensions that leaked memory many years ago. It's so easy to forget a ref count operation. I don't have data, but I suspect it happens a lot in practice.

With tracing, you have to annotate stack roots (and global roots if you have them). To me that seems less error prone. You can overapproximate them and it doesn't really change much.

Moving is indeed a big pain, and I'm about to back out of it for Oil :-/

edit: I don't have any experience with Objective C, but I also think this comment is unsurprising, and honestly I would probably get it wrong too:

https://news.ycombinator.com/item?id=32283641

I feel like ref counting is more "littered all over your code" than GC is, which means there's more opportunity to get it wrong.

kaba0 10 days ago

parent

next [–]

It’s a good point that it is a topic that doesn’t get enough coverage, but let’s just add that it has good solutions for most use cases: GCs can pin objects that might be used from some other language.

---

 eru 10 days ago | parent | prev | next [–]

If you have pure, immutable and strict, you can't create cycles. (That's what Erlang does for example.) That makes a lot of memory management techniques much simpler. Both tracing garbage collection and reference counting.

If you have pure, immutable and lazy, you can get cycles. (That's Haskell.) This is almost as complicated for a GC as not having immutability.

---

aidenn0 on Dec 9, 2013

parent

next [–]

See this: http://www.hpl.hp.com/personal/Hans_Boehm/gc/

But also with a very small number of restrictions, you can make C have non-conservative GC.

ANSI C forbids aliasing as any type but a character. Furthermore, unions are to be interpreted as the last type they are accessed as.

The three places you run into issues are:

1) You can treat two structs that begin with the same values interchangeably so long as you only access the common beginning of the structures.

2) You can freely convert pointers to integers and back.

3) Storage returned by malloc() can be used heterogeneously (however, with strict aliasing rules, you can't switch types of the data once you've accessed it)

If you forbid those three things, then you can have a strict gc by adding runtime overhead to accessing data (once malloced data is accessed by a non-character type, you can treat it like that type forever).

However, lots and lots of code violates these rules, (the most famous code that does is probably the fast inverse square-root)[1]

http://betterexplained.com/articles/understanding-quakes-fas...

---

benhoyt 23 hours ago

next [–]

The "Garbage Collection" chapter in the OP's recent book Crafting Interpreters goes into this in a bunch more detail (in the context of a language VM): https://craftinginterpreters.com/garbage-collection.html -- so well written, both this and the 2013 blog post.

(by 2013 blog post he means https://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/ )

rurban 16 hours ago

parent

next [–]

But it stays at a primitive Mark & Sweep collector, not the more advanced Cheney compacting, copying, half-space collector. It doesn't even mention how bad Mark & Sweep is, esp. for larger heaps.

---

snarfy 11 hours ago

next [–]

For a long time I had an idea I thought was great - a lock free garbage collector that could be run on its own core to achieve zero latency. I've got 32 cores busy doing nothing and thought I could throw hardware at the problem and get garbage collection for free.

A bunch of people much smarter than me all chimed in on how this was a terrible idea. They argued it would destroy the cache and result in worse performance than a traditional garbage collector. Locality matters, apparently. I still sometimes think of this idea but it doesn't fit very well into the actual hardware.

hinkley 8 hours ago

parent

next [–]

Interprocess cache invalidation is one of the bigger devils in the details. It’s one part of the memory corruption crisis that the Java Memory Model tries to solve and then a number of other languages copied.

There are concurrency libraries that allocate 8x as much memory as they need just to guarantee that two threads don’t share a cache line.

---

eru 10 days ago

unvote

parent

next [–]

If you have pure, immutable and lazy, you can get cycles. (That's Haskell.) This is almost as complicated for a GC as not having immutability.

---

agentultra 25 days ago

parent

context

next [–]

on: Some Thoughts on Zig

Another thing which makes Zig interesting is it’s approach to allocation, making a failure to allocate memory a catchable error, and enabling developers to easily assign their own allocators.

pjmlp 25 days ago

parent

next [–]

Turbo Pascal 6.0 manual from 1990, page 210

http://www.bitsavers.org/pdf/borland/turbo_pascal/Turbo_Pasc...

What is old is new again. :)

alphazino 25 days ago

root

parent

next [–]

Yes, but I think you missed the point of the parent comment.

> enabling developers to easily assign their own allocators

Through a combination of Zig's convention to pass around allocators instead of using a global allocator, the allocator interface's ability to return errors, and the Zig language forcing you to handle all errors (unless you explicitly discard them), a lot of Zig code is written with allocation failure in mind.

This means that you can seamlessly pass a custom allocator, say a bump allocator backed by a fixed buffer, to a function in some third-party library and still handle the case where the buffer runs out of space.

---

this HN user frequently posts insightful comments with links to research papers about advanced garbage collection techniques https://news.ycombinator.com/threads?id=hayley-patton

this appears to be their blog: https://applied-langua.ge/

and their twitter: https://twitter.com/nodefunallowed/

they apparently use sbcl: https://twitter.com/nodefunallowed/status/1557691547336208384/photo/1

this is apparently their reddit username, according to https://news.ycombinator.com/item?id=32277443: https://www.reddit.com/user/theangeryemacsshibe/

---

adgjlsfhk1 26 days ago

parent

context

favorite

on: Reference count, don't garbage collect

The blog post is largely incoherent for several reasons.

1. It recommends 8 byte counters. This implies making every object in your programming language 8 bytes bigger which is pretty much a non-starter. Almost everyone that actually uses reference counting uses more like 2-4 bits.

2. Reference counting can have arbitrarily long pauses as well. If you decrease the reference count of an object to zero, you have to decrease the reference count of all referenced objects, which can do significant amounts of work (specifically, it will do a lot of work in the cases where regular GC does almost no work).

3. The blog states that atomic writes are "basically free", but that ignores the fact that in multi-threaded code, atomic writes can actually be fairly expensive since each one requires communication between every thread (This is why python still has a GIL)

4. Because no one uses 8 bytes for the count (since you don't want to double your memory consumption), you still need a GC anyway to deal with objects that get too many references.

OskarS? 26 days ago

root

parent

next [–]

> Almost everyone that actually uses reference counting uses more like 2-4 bits.

Here is an incomplete list of languages which use 8 bytes for their reference count on 64-bit computers:

1. Rust, in Rc<T> and Arc<T>

2. C++, in std::shared_ptr<T>

3. Objective-C, in NSObject's retainCount

4. Swift, because of Objective-C

5. Python, where reference count is Py_ssize_t

These were literally the first languages i thought of, they all use 64-bit types. I would argue since reference counting is much rarer than GC, these make up the bulk of reference counting use in the real world. "Almost everyone", huh? It's a bit rich accusing the author of being "almost incoherent" and then say this.

-- [7]

---

 kaba0 26 days ago | root | parent | prev | next [–]

With all due respect, why do you believe that your experience is meaningful compared to people writing actual industrial-scale runtimes, when RC is the hello world of GC algorithms? It’s not that they don’t know about it, it’s that they studied the topic countless times and found significant difference between the two approaches, to the point that no language considered fast uses RC (or if they do, they do so because it is a useful escape hatch to manual memory management that doesn’t need support from the runtime).

---

benibela 25 days ago

next [–]

An advantage of RC is that you can also use it to verify ownership.

When the counter is 1, you can do anything with the object without affecting any other references.

Like the object could be mutable for a counter=1, and copy-on-write otherwise. Then you can make a (lazy) deep copy by just increasing the counter.

---

pizlonator 25 days ago

next [–]

Atomic inc/dec is hella expensive relative to not doing it. It’s true that CPUs optimize it, but not enough to make it free. RC as a replacement for GC means doing a lot more of this expensive operation - which the GC will do basically zero of in steady state - so this means RC just costs more. Like 2x slowdown more.

The atomic inc/dec also have some nasty effects on parallel code. The cpu ends up thinking you mutated lines you didn’t mean to mutate.

So, GC is usually faster. RC has other benefits (more predictable behavior and timing, uses less memory, plays nicer with OS APIs).

manuelabeledo 25 days ago

parent

next [–]

> So, GC is usually faster.

GC is way faster if there is little collection.

In memory or cache intensive applications, garbage collection as a whole can be significantly slower.

pizlonator 25 days ago

root

parent

next [–]

GC is faster even if you collect a lot. GCs create better cache locality especially for recently allocated objects, and their cache behavior is not generally worse than malloc (but there are many GCs and many mallocs and some try harder than others to make caches happy).

The total time spent in GC across a program’s execution time is usually around 30% or so. Maybe more in some cases (some crazy Java workloads can go higher) or less in others (JavaScript? since the mutator is slow), but 30% is a good rule of thumb. That includes the barriers, and total cost of all allocations, including the cost of running the GC itself.

Reference counting applied as a solution to memory safety, as a replacement for GC, is going to cost you 2x overhead just for the ref counting operations and then some more on top of that for the actual malloc/free. When you throw in the fact that GCs always beats malloc/free in object churn workloads, it’s likely that the total overhead of counting refs, calling free(), and using a malloc() that isn’t a GC malloc is higher than 2x, I.e. more than 50% of time spent in memory management operations (inc, dec, malloc, free).

It’s a trade off, though. The GC achieves that 30% because it uses more memory. All of the work of understanding the object graph is amortized into a graph search that happens infrequently, leading to many algorithmic benefits (like no atomic inc/dec, faster allocation fast path, freeing is freeish, etc), but also causing free memory to be reused with a delay, leading to 2x or more memory overhead.

That also implies that if you ask the GC to run with lower memory overhead, it’ll use more than 30% of your execution time. It’s true that if you want the memory usage properties of RC, and you try to tune your GC to get you there, you gonna have a slow GC. But that’s not how most GC users run their GCs.

---

" But yeah, the correct way to handle resources (not just memory!) is with value semantics and RAII. Because then you know the object will be cleaned up as soon as it goes out of scope with zero additional effort on your part. In places where this is not appropriate, a simple reference counting scheme may be used, but the idea is to keep the number of rc'd objects small. Do not use cyclical data structures. Enforce a constraint: zero cycles. For data structures like arbitrary graphs, keep an array of vertices and an array of edges that reference vertices by index. " -- [8]

---

"objc said this some small number of bits is more than enough in all normal code, and falls back to the hash table if the refcount ever hits the maximum value." -- [9]

---

" There are also promising experiments with non-garbage-collected managed languages. Eg both val and roc feature deterministic memory management and c/rust-like levels of control over memory layout while still preserving the simple experience of a garbage-collected language. I think it's likely that we'll see at least one production-ready language in this vein within the next decade. "-- https://www.scattered-thoughts.net/writing/how-safe-is-zig/

https://www.val-lang.dev/ https://www.roc-lang.org/

---

Oil shell had a lot of problems with their GC b/c they kept forgetting to notify the garbage collector about intermediate values that should have been GC roots [10]

Their current solution is to disallow garbage collection except at safe points (it used to be allowed at any allocation) and to only put safepoints in a few spots. Now only functions that are on the call stack as of the safepoint need to have correctly notified the GC about their roots. Most of these functions are automatically compiled from a higher-level-language, so now there is less chance of a mistaken omission from the GC roots: [11]

[12]

older: [13] (i assume the reference to MPS means this? https://www.ravenbrook.com/project/mps/master/manual/html/guide/overview.html#guide-overview ) ---