Bayle Shanks's website: proj-oot-ootMemoryManagementNotes3

sources of GC latency in golang, from https://tip.golang.org/doc/gc-guide : " Brief stop-the-world pauses when the GC transitions between the mark and sweep phases, Scheduling delays because the GC takes 25% of CPU resources when in the mark phase, User goroutines assisting the GC in response to a high allocation rate, and Pointer writes requiring additional work while the GC is in the mark phase. Running goroutines must be suspended for their roots to be scanned. "

so, the relevant ones for us are:

Brief stop-the-world pauses when the GC transitions between the mark and sweep phases,
Pointer writes requiring additional work while the GC is in the mark phase.
Running goroutines must be suspended for their roots to be scanned.

Getting to Go: The Journey of Go's Garbage Collector mentions that:

Golang's GC is non-moving but (due to other allocation and virtual memory techniques?) they dont think fragmentation is a problem
(imo probably related to it being non-moving), Golang's GC has no read barrier
it is important to avoid an always-on write barrier

---

i dont think we'll do this but it is interesting, b/c that's not what i thought 'reference counting' was:

randyrand 26 days ago

parent

next [–]

I feel like the most user friendly solution for Use After Free is to just use Reference Counting. Basically, just copy Objective-C's "weak_ptr" design.

For every allocation, also store on the heap a "reference object", that keeps track of this new reference.

  struct reference_t {
   // the ptr returned by malloc
   void* ptr;
   // the stack/heap location that ptr was written to.
   // i.e  the reference location.
   void* referenceAddr; 
  }

Reference track: every time ptr is copied or overwritten, create and delete reference objects as needed.

If ptr is freed, visit every reference and call *referenceAddr = NULL, turning all of these references into NULL pointers.

---

maybe ask somewhere for advice:

Choosing a automatic memory management scheme for a high-level programming language.

I'm looking for a simple automatic memory management scheme with short pause times. Most programming languages that I see want performant memory management at the cost of being complex, but I care more about simplicity; except that I do care about having short pause times.

Reference counting or tracing garbage collection or both are ok (if there is a third option, that's okay too); but if reference counting, some sort of cycle detection should be in there too. The high-level language will have a runtime, so it's okay to require that the runtime do things like: - box values and store GC metadata with each box - have read barriers or write barriers or both - do reference counting, or some other computation whenever memory is accessed or changed

The developer using this high-level programming language should not have to think about memory management at all, so stuff like what Rust does is out of scope.

Criteria:

it should definitely have short pauses, be portable, and be concurrent
other criteria in order of importance: simple > low memory usage > fast
note: by "simple" i mean few lines of code ("LOC") when implemented in C
If your answer is "just use this library", the source code of the library counts towards the LOC
note: these priorities are relative, not absolute; for example, I'm willing to tolerate a little extra complexity and a little extra memory usage in exchange for a lot more speed
deterministic finalization is not necessary
i'm on the fence about compaction. I want the user to be able to run programs for a long time without running out of memory. Otoh my impression is that many highly-regarded contemporary languages, such as Go/Golang, have non-compacting GC, so maybe fragmentation is only a theoretical issue that doesn't come up in practice? Otoh maybe these languages only avoid fragmentation issues by doing complicated or non-portable things in their allocator, which I don't want to do.

My best guess right now is that I should use reference counting plus a cycle detector, and a write barrier that is on only when the cycle detector is confirming a cycle, but I could be wrong.

What do you recommend?

(later: well i think a 'cycle detector' would have to be a full tracing GC, like Python, b/c what if each node in every cycle participates in multiple cycles? It's not as simple as looking for nodes with refcount 1, following outgoing references from them, and finding a cycle)

shorter version:

Choosing a automatic memory management scheme for a programming language.

This will be a high-level programming language with a runtime. The programmer should not have to think about memory management much, if at all (therefore the programmer should not have to do Rust-style thinking about ownership). The runtime will know where pointers are and is allowed to do things like box values and store GC metadata there, have write barriers and/or read barriers, reference count, etc.

Criteria:

requirements: short pauses, portable, concurrent
other criteria in order of importance: simple > low memory usage > fast
note: by "simple" i mean few lines of code ("LOC") when implemented in C
If your answer is "just use this library", the source code of the library counts towards the LOC
note: these priorities are relative, not absolute; for example, I'm willing to tolerate a little extra complexity and a little extra memory usage in exchange for a lot more speed
reference cycle detection is probably necessary. I'm still considering not having reference cycle detection, and putting the burden on the programmer to use weak references to break cycles in some cases, but i'm leaning towards saying that this burden should be removed from the programmer and the language should do reference cycle detection.
deterministic finalization is not necessary
i'm on the fence about compaction. I want the user to be able to run programs for a long time without running out of memory. Otoh my impression is that many highly-regarded contemporary languages, such as Go/Golang and Python, have non-compacting GC, so maybe fragmentation is only a theoretical issue that doesn't come up in practice? Otoh maybe these languages only avoid fragmentation issues by doing complicated or non-portable things in their allocator?

What do you recommend?

---

I'm looking for recommendations for a simple automatic memory management scheme for a programming language. Reference-counting and/or tracing garbage collection are both options (I'm open to other suggestions too). I care about short pause times.

Unlike many programming language implementations that I see, which are performant at the cost of a complex implementation, I care most about simplicity of implementation, and am willing to accept poor performance (except my concern about pause times). This is all all relative; I will accept a little implementation complexity in exchange for a lot of performance. So the criteria are: short pause times > simple implementation > low memory usage > fast

I'm on the fence about compaction. I do want the user to be able to run programs for a long time without running out of memory. However, various contemporary language implementations, such as Golang (Go) and Python have non-compacting GCs and I don't hear a lot of complaints about people's programs crashing due to memory fragmentation. Is fragmentation only a theoretical issue that doesn't come up in practice -- or, do those implementations have to do something complicated to avoid fragmentation, something that I could avoid doing if I do compaction?

This will be a high-level programming language with a runtime. The programmer should not have to think about memory management much, if at all (the language should detect reference cycles). The runtime will know where pointers are and could do things like box values and store GC metadata there, have write barriers and/or read barriers, reference count, etc. The memory management scheme should be portable and concurrent.

What do you recommend?

---

"The M-Machine’s limitation of power-of-two-sized objects was a show stopper for a lot of real-world things (at least 25% memory overhead, close to 50% in some cases). " -- [1]

---

Title Hardware and Software Support for Managed-Language Workloads in Data Centers Permalink https://escholarship.org/uc/item/1qv7m47s Author Maas, Martin Christoph Publication Date 2017 Peer reviewed

Thesis/dissertation

https://escholarship.org/content/qt1qv7m47s/qt1qv7m47s.pdf 7.2 and 7.2.1

---

doug-moen 24 hours ago

link

flag

RC in C++ using std::shared_ptr is a different proposition than RC in a memory safe language that is designed for safe and seamless RC. In the latter case, the compiler knows about RC and can do a lot of tricks to minimize cost.

High performance GC uses a lot of tricks to achieve performance. If the same amount of work is put into optimizing RC, using an optimizing compiler that emits different kinds of refcount management code in different situations, then you can do far better than what’s possible using a smart pointer class in C++.

In C++ using std::shared_ptr, it’s true that RC is all done locally and you can’t stop it halfway. That isn’t true in a compiler supported and optimized RC runtime. You can use lazy RC, for example. Deferred RC is also nonlocal.

Furthermore, with the approach I’m describing, you can eliminate the need for atomic operations for RC update in multithreaded code. There are several ways to do lock-free RC.

For more information, the Wikipedia article is a start but there’s a lot more information about techniques in papers that you can find with a Google Scholar search.

https://en.wikipedia.org/wiki/Reference_counting https://scholar.google.com/scholar?&q=reference%20counting

---

"AFAIK, the best modern state of art garbage collectors have stop-the-world pauses, proportional to size of root set and/or thread count." -- [2]

"alloc/free..can be doing all sorts of book keeping behind the scenes, including cascading destructor calls which could free a chain of objects the size of live memory." [3]

"The Garbage Collection Handbook originally published in 1996 defines garbage collection as: "All garbage collection schemes are based on one of four fundamental approaches; mark-sweep collection, copying collection, mark-compact collection or reference counting."" -- [4]