Table of Contents for Programming Languages: a survey

Chapter: Interop


Lua 2.1 provided external programmers with calls to retain and release references to Lua objects, but sometimes programmers forgot to use them or misused them. They switched to an explicit stack-based model in Lua 4.0, which they report solved these problems.

" Conceptually, a lua_Object was a union type, since it could refer to any Lua value. Several scripting languages, including Perl, Python, and Ruby, still use a union type to represent their values in C. The main drawback of this representation is that it is hard to design a garbage collector for the language. Without extra information, the garbage collector cannot know whether a value has a reference to it stored as a union in the C code. Without this knowledge, the collector may collect the value, making the union a dangling pointer. Even when this union is a local variable in a C function, this C function can call Lua again and trigger garbage collection.

Ruby solves this problem by inspecting the C stack, a task that cannot be done in a portable way. Perl and Python solve this problem by providing explicit reference-count functions for these union values. Once you increment the reference count of a value, the garbage collector will not collect that value until you decrement the count to zero. However, it is not easy for the programmer to keep these reference counts right. Not only is it easy to make a mistake, but it is difficult to find the error later (as anyone who has ever de- bugged memory leaks and dangling pointers can attest). Fur- thermore, reference counting cannot deal with cyclic data structures that become garbage.

Lua never provided such reference-count functions. Be- fore Lua 2.1, the best you could do to ensure that an unan- chored lua_Object was not collected was to avoid calling Lua whenever you had a reference to such a lua_Object . (As long as you could ensure that the value referred to by the union was also stored in a Lua variable, you were safe.) Lua 2.1 brought an important change: it kept track of all lua_Object values passed to C, ensuring that they were not collected while the C function was active. When the C func- tion returned to Lua, then (and only then) all references to these lua_Object values were released, so that they could be collected. 16

More specifically, in Lua 2.1 a lua_Object ceased to be a pointer to Lua’s internal data structures and became an index into an internal array that stored all values that had to be given to C: typedef unsigned int lua_Object;

This change made the use of lua_Object reliable: while a value was in that array, it would not be collected by Lua. 16

A similar method is used by JNI to handle “local references”. When the C function returned, its whole array was erased, and the values used by the function could be collected if pos- sible. (This change also gave more freedom for implement- ing the garbage collector, because it could move objects if necessary; however, we did not followed this path.)

For simple uses, the Lua 2.1 behavior was very practi- cal: it was safe and the C programmer did not have to worry about reference counts. Each lua_Object behaved like a local variable in C: the corresponding Lua value was guar- anteed to be alive during the lifetime of the C function that produced it. For more complex uses, however, this simple scheme had two shortcomings that demanded extra mecha- nisms: sometimes a lua_Object value had to be locked for longer than the lifetime of the C function that produced it; sometimes it had to be locked for a shorter time.

The first of those shortcomings had a simple solution: Lua 2.1 introduced a system of references . The function lua_lock got a Lua value from the stack and returned a reference to it. This reference was an integer that could be used any time later to retrieve that value, using the lua_getlocked function. (There was also a lua_unlock function, which destroyed a reference.) With such refer- ences, it was easy to keep Lua values in non-local C vari- ables.

The second shortcoming was more subtle. Objects stored in the internal array were released only when the function returned. If a function used too many values, it could over- flow the array or cause an out-of-memory error. For instance, consider the following higher-order iterator function, which repeatedly calls a function and prints the result until the call returns nil:

void l_loop (void) { lua_Object f = lua_getparam(1); for (;;) { lua_Object res; lua_callfunction(f); res = lua_getresult(1); if (lua_isnil(res)) break; printf("%s\n", lua_getstring(res)); } }

The problem with this code was that the string returned by each call could not be collected until the end of the loop (that is, of the whole C function), thus opening the possibility of array overflow or memory exhaustion. This kind of error can be very difficult to track, and so the implementation of Lua 2.1 set a hard limit on the size of the internal array that kept lua_Object values alive. That made the error easier to track because Lua could say “too many objects in a C function” instead of a generic out-of-memory error, but it did not avoid the problem.

To address the problem, the API in Lua 2.1 offered two functions, lua_beginblock and lua_endblock , that cre- ated dynamic scopes (“blocks”) for lua_Object values; all values created after a lua_beginblock were removed from the internal array at the corresponding lua_endblock . However, since a block discipline could not be forced onto C programmers, it was all too common to forget to use these blocks. Moreover, such explicit scope control was a little tricky to use. For instance, a naive attempt to correct our previous example by enclosing the for body within a block would fail: we had to call lua_endblock just before the break , too. This difficulty with the scope of Lua objects persisted through several versions and was solved only in Lua 4.0, when we redesigned the whole API. "

" To further illustrate the new API, here is an implementa- tion of our loop example in Lua 4.0:

int l_loop (lua_State *L) { for (;;) { lua_pushvalue(L, 1); lua_call(L, 0, 1); if (lua_isnil(L, -1)) break; printf("%s\n", lua_tostring(L, -1)); lua_pop(L, 1); } return 0; }

To call a Lua function, we push it onto the stack and then push its arguments, if any (none in the example). Then we call lua_call , telling how many arguments to get from the stack (and therefore implicitly also telling where the function is in the stack) and how many results we want from the call. In the example, we have no arguments and expect one result. The lua_call function removes the function and its arguments from the stack and pushes back exactly the requested number of results. The call to lua_pop removes the single result from the stack, leaving the stack at the same level as at the beginning of the loop. For convenience, we can index the stack from the bottom, with positive indices, or from the top, with negative indices. In the example, we use index -1 in lua_isnil and lua_tostring to refer to the top of the stack, which contains the function result. " --

Lua also introduced multiple, independent Lua states and made these explicit in the Lua 4.0 API.

"Support for multiple, independent Lua states was achieved by eliminating all global state. Until Lua 3.0, only one Lua state existed and it was imple- mented using many static variables scattered throughout the code. Lua 3.1 introduced multiple independent Lua states; all static variables were collected into a single C struct. An API function was added to allow switching states, but only one Lua state could be active at any moment. All other API functions operated over the active Lua state, which remained implicit and did not appear in the calls. Lua 4.0 introduced explicit Lua states in the API. " --

this necessitated a registry:

" The possibility of multiple states in Lua 4.0 created an un-expected problem for the reference mechanism. Previously, a C library that needed to keep some object fixed could cre-ate a reference to the object and store that reference in a global C variable. In Lua 4.0, if a C library was to work with several states, it had to keep an individual reference for each state and so could not keep the reference in a global C vari- able. To solve this difficulty, Lua 4.0 introduced the registry , which is simply a regular Lua table available to C only. With the registry, a C library that wants to keep a Lua object can choose a unique key and associate the object with this key in the registry. Because each independent Lua state has its own registry, the C library can use the same key in each state to manipulate the corresponding object. We could quite easily implement the original reference mechanism on top of the registry by using integer keys to represent references. To create a new reference, we just find an unused integer key and store the value at that key. Retriev- ing a reference becomes a simple table access. However, we could not implement weak references using the registry. So, Lua 4.0 kept the previous reference mechanism. In Lua 5.0, with the introduction of weak tables in the language, we were finally able to eliminate the reference mechanism from the core and move it to a library. "


" rwbarton

...there are two (kinds of) stacks in a running program, the Haskell stack (one per Haskell thread) and the C stack (one per capability/OS thread, so just one if not using the threaded runtime). The C stack(s) are set up in the normal way as for a plain old C program; they aren't especially small, or allocated on the heap, or anything like that, and on x86 the esp/rsp register points to the top of the C stack. The Haskell stack is allocated on the heap as a linked list of small chunks and during the execution of Haskell code, some other register (I forget which) is used to track the top of the Haskell stack.

For unsafe FFI calls and LLVM instrinsic fallback function calls, GHC simply emits a call instruction, using the C stack. This is okay because GHC's cooperative multitasking triggers on heap checks that appear only in Haskell code. There's no way for a single capability to ever be in the middle of two unsafe FFI calls, so we just need one "large" stack (the C stack) per capability.

My guess is that Rust uses the C stack (as in, a stack pointed to by esp/rsp) for its native code, so it needs every Rust-thread stack to be large enough to meet the space requirements of FFI/LLVM intrinsic function calls. ...


Yes, this is all of this is correct as far as I understand, including the bit about Rust sharing its stack for foreign calls (so they're in a tougher spot here.)


GHC keeps a separate Haskell stack aside from a C stack for foreign calls, as rwbarton said. This makes foreign calls much harder for Rust with these dynamic stacks since it needs to be big enough. "

-- [1]