proj-oot-ootAssemblyNotes28

Difference between revision 5 and current revision

No diff available.

i'm still wondering where to put stuff like thread control, a scheduler, IPC primitives like as seen in microkernels. Also, i'm thinking one layer should support only cooperative multitasking and only the next layer up from that should support preemptive multitasking, but which layers?

OVM should have preemptive multitasking provided, so if cooperative multitasking comes first, it has to be, at the highest, somewhere in the layer below OVM (even if just as a library).

BootX? provides a bunch of optional platform primitives, plus instructions that can be simply implemented as a few macroinstructions, so if scheduling is going to be programmed de novo by our toolchain on some platforms, then it should be on a layer higher than BootX?.

Which leaves LOVM as the only option. The issue there is that a big part of the point of LOVM is that it's too annoying to write e.g. a garbage collection directly in Boot (or Boot+BootX?). And the same thing applies to a scheduler.

hmm, i guess though that if it's a library in LOVM then that objection doesn't apply -- the library can itself be written in LOVM.

so it's looking like this stuff should be in a library in LOVM.

---

regarding smallstack size:

" Most Microchip PIC 8-bit micros have a hardware stack with a depth of only 8! (the size will vary for different PIC devices). Because the stack depth on these micros is so small it is used only for function calls. Each function call will consume one level of the hardware stack. The rest of the variables are pushed into a software stack which is automatically handled by the compiler....Your microcontroller (PIC16F1709) has a 16-level hardware stack, which is a fairly good depth." -- [1]

---

yknow, actually, let's take the macros out of the LOVM assembly and put them in Lo only.

done.

---

can we do this thing called 'NaN? tagging' that wren does? It apparently allows you to have a uniform 8-byte representation for 32-bit ints, 64-bit doubles, and x64-64 pointers, avoiding the need to box floats and either use pointers or >8 byte representations:

" A compact value representation #

A core piece of a dynamic language implementation is the data structure used for variables. It needs to be able to store (or reference) a value of any type, while also being as compact as possible. Wren uses a technique called NaN? tagging for this.

All values are stored internally in Wren as small, eight-byte double-precision floats. Since that is also Wren’s number type, in order to do arithmetic, no conversion is needed before the “raw” number can be accessed: a value holding a number is a valid double. This keeps arithmetic fast.

To store values of other types, it turns out there’s a ton of unused bits in a NaN? double. You can stuff a pointer for heap-allocated objects, with room left over for special values like true, false, and null. This means numbers, bools, and null are unboxed. It also means an entire value is only eight bytes, the native word size on 64-bit machines. Smaller = faster when you take into account CPU caching and the cost of passing values around. " -- https://wren.io/performance.html#a-compact-value-representation

that page links to http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations for further explanation, which explains that NaNs? have 53 bytes free, and x64-64 pointers have only 48 usable bits.

i don't want to hardwire it in, i just want to make it possible to do this.

---

"What kind of CPU features required for operating system?

Privilege protections? Virtual address? Interrupt?

...

Based on this experience, I made the draft specifications of the interrupt and virtual address translation for our homebrew CPU. In order to keep it simple, we decided to omit hardware privilege mechanisms like Ring protection. ... I added interrupt simulation capability to our simulator which Wataru had made in the core part of CPU experiments, and also completed support for virtual address translation. This gave the simulator enough functionality to run the OS. ... When I ported Xv6 to MIPS, I had GDB, so it was rather OK, but our own simulator didn’t have any debug features, so it must have been very difficult to debug. Shohei couldn’t bear the difficulty of debugging, so he added a disassembler and a debug dump function to the simulator. After this, the simulator’s debugging features were rapidly upgraded by the OS team, and finally the simulator grew to look like the following picture.

https://fuel.edby.coffee/images/simulator.png " -- https://fuel.edby.coffee/posts/how-we-ported-xv6-os-to-a-home-built-cpu-with-a-home-built-c-compiler/

---

For call3, if there are 4 operands, then you have to pass them in eight registers (or maybe eight positions on the small stack) because for each operand you need to pass both the value and the address to allow for all the various addressing modes. Alternately just the address of the operand is passed, and if the caller provides an immediate value then the implementation copies it into otherwise in accessible memory, and if the caller provides a register value, then the implementation copies it into otherwise inaccessible memory and copies it back into the register at the end of the call (Don't have to worry about it being in the register in the middle of the call if something else is called because instructions are "atomic")

so if You have two register banks and two small stacks and you need to have four extra arguments to CALL to say how many things need to be saved / Or on the other hand 4 extra arguments to ENTRY to say how many callee-saved things need to be saved. Or you could just say that SMALLSTACK is caller-saved, and then don't provide facilities for in caller saving, which means you'd only have two things to specify and only upon ENTRY.

alternately, you could specify how many SMALLSTACK locations in each bank will be needed in ENTRY and then a primitive could be provided to free that many locs in SMALLSTACK one way or another; maybe by popping stuff from the top of the stack, or maybe by spilling stuff from the bottom of the stack; the convention would be that you can't make any assumptions that you can access the callers stack from the Callee (negating the opportunity to use the stack to pass arguments through many levels of calls).

---

some other ideas for SMALLSTACK:

---

Regarding the advantage of multiple stacks, the stack computers book just says "In the case where the parameter stack is separate from the return address stack, software may pass a set of parameters through several layers of subroutines with no overhead for recopying the data into new parameter lists.

An important advantage of having multiple stacks is one of speed. Multiple stacks allow access to multiple values within a clock cycle. As an example, a machine that has simultaneous access to both a data stack and a return address stack can perform subroutine calls and returns in parallel with data operations." [2]. Neither of those are too important to us (the first is provided by argument registers, and the second is lower-level than we care about).

---

reflecting on the previous section, i think that for us, the only important advantages of SMALLSTACK is:

---

more notes on how many registers we need/ how large smallstack should be

I keep coming back to register and stack sizes of 16. If we had two register banks and two stack banks then if each of those wears a size 16 we have 64 locations total which is exactly 1/4 of 256 allowing easy addressing of registers by the implementation if everything is 32 bits. if Each register bank is a size 16 and that leaves eight for callersave registers and 8 for callee-save

the riwcv c extension has shortcuts for the eight most popular registers.

The riskv base has 32 int registers, And E profile has s16

---

"As a practical matter, a stack size of 32 will eliminate stack buffer overflows for almost all programs." -- [3]