Revision 11 not available (showing current revision instead)

proj-plbook-plChSingleIntermedLangs

Table of Contents for Programming Languages: a survey

Other intermediate target languages designed for implementing a single high-level language


CPython bytecode

Two stacks:

Stack ops:

Arithmetic:

instructions:

quoted and paraphrased from https://docs.python.org/release/3.3.2/library/dis.html

General instructions:

Unary operations:

Unary operations take the top of the stack, apply the operation, and push the result back on the stack.

Binary operations:

Binary operations remove the top of the stack (TOS) and the second top-most stack item (TOS1) from the stack. They perform the operation, and put the result back on the stack.

In-place operations:

In-place operations are like binary operations, in that they remove TOS and TOS1, and push the result back on the stack, but the operation is done in-place when TOS1 supports it, and the resulting TOS may be (but does not have to be) the original TOS1.

INPLACE_POWER, INPLACE_MULTIPLY, INPLACE_FLOOR_DIVIDE, INPLACE_TRUE_DIVIDE, INPLACE_MODULO, INPLACE_ADD, INPLACE_SUBTRACT, INPLACE_LSHIFT, INPLACE_RSHIFT, INPLACE_AND, INPLACE_XOR, INPLACE_OR, STORE_SUBSCR (TOS1[TOS] = TOS2), DELETE_SUBSCR (del TOS1[TOS]),

Miscellaneous opcodes:

PRINT_EXPR

Miscellaneous opcodes with arguments:

All of the following opcodes expect arguments. An argument is two bytes, with the more significant byte last.

todo

    Prefixes any opcode which has an argument too big to fit into the default two bytes. ext holds two additional bytes which, taken together with the subsequent opcode’s argument, comprise a four-byte argument, ext being the two most-significant bytes.

CALL_FUNCTION_VAR(argc)

    Calls a function. argc is interpreted as in CALL_FUNCTION. The top element on the stack contains the variable argument list, followed by keyword and positional arguments.

CALL_FUNCTION_KW(argc)

    Calls a function. argc is interpreted as in CALL_FUNCTION. The top element on the stack contains the keyword arguments dictionary, followed by explicit keyword and positional arguments.

CALL_FUNCTION_VAR_KW(argc)

    Calls a function. argc is interpreted as in CALL_FUNCTION. The top element on the stack contains the keyword arguments dictionary, followed by the variable-arguments tuple, followed by explicit keyword and positional arguments.

Note: as of Python 3.6, "The Python interpreter now uses a 16-bit wordcode instead of bytecode which made a number of opcode optimizations possible. (Contributed by Demur Rumed with input and reviews from Serhiy Storchaka and Victor Stinner in bpo-26647 and bpo-28050.)" [1].

(rationale: "The change is to have all instructions take an argument. This removes the branch on each instruction on whether to load oparg. It then also aligns instructions to always be 2 bytes rather than 1 or 3 by having arguments only take up 1 byte. In the case that an argument to an instruction is greater than 255, it can chain EXTENDED_ARG up to 3 times. In practice this rarely occurs, mostly only for jumps, & abarnert measured stdlib to be ~5% smaller

The rationale is that this offers 3 benefits: Smaller code size, simpler instruction iteration/indexing (One may now scan backwards, as peephole.c does in this patch), which between the two results in a small perf gain (the only way for perf to be negatively impacted is by an increase in EXTENDED_ARGs, when I post benchmarking I'll also post a count of how many more EXTENDED_ARGs are emitted)

This also means that if I want to create something like a tracer that tracks some information for each instruction, I can allocate an array of codesize/2 bytes, then index off of half the instruction index. This isn't currently done in peephole.c, nor does this include halving jump opargs

" ) -- [2]

Links:


Dalvik VM

todo http://www.netmite.com/android/mydroid/dalvik/docs/dalvik-bytecode.html

(todo is this wrong?) interestingly, Dalvik only has 16 registers.

however:

" The Dalvik instruction set implements an interesting compromise: it is register based, but there are a finite number of them as opposed to the theoretically infinite registers of LLVM or Parrot. Dalvik supports 65,536 registers, a vast number compared to hardware CPUs and presumably sufficient to implement SSA (if desired) in reasonably large functions.

Even more interestingly, not all Dalvik instructions can access all registers. Many Dalvik instructions dedicate 4 bits to the register number, requiring their operands to be stored in the first 16 registers. A few more instructions have an 8 bit instruction number, to access the first 256. There are also instructions to copy the value to or from any of the 65,536 registers to a low register, for a subsequent instruction to access.

It took a while to understand the rationale for this choice, and I'm still not confident I fully get it. Clearly the Dalvik designers believe that keeping data in one of the high registers will be faster than explicitly storing it to memory, even if the vast number of registers end up mostly residing in RAM. Addressing data as register numbers instead of memory addresses should make it easier for the VM to dynamically remap Dalvik registers to the real hardware registers. For example, if it can predict that virtual register 257 will likely be used in the near future it can be kept in a CPU register instead of being immediately stored to memory. " -- http://codingrelic.geekhold.com/2010/07/virtual-instruction-sets-opcode.html

Non-deprecated, non-implementation-specific instructions (many instructions have _2ADDR forms, and many also have _LIT8 and _LIT16 forms, and many have _WIDE forms, and many have _FLOAT, _INT, and _LONG forms, and many have _BOOLEAN, _BYTE, _CHAR, _OBJECT, _SHORT, and _WIDE forms, and many have _16 and _32 forms, and many of these are mixed, e.g. ADD_INT_2ADDR. I'll only list the base instruction in these cases)

ADD AGET AND APUT ARRAY_LENGTH CHECK_CAST CMPG CMPL CMP CONST CONST_CLASS CONST_HIGH16 CONST_STRING CONST_STRING_JUMBO DIV DOUBLE_TO_FLOAT DOUBLE_TO_INT DOUBLE_TO_LONG FILLED_NEW_ARRAY FILLED_NEW_ARRAY_RANGE FILL_ARRAY_DATA FLOAT_TO_DOUBLE FLOAT_TO_INT FLOAT_TO_LONG GOTO IF_EQ IF_EQZ IF_GE IF_GEZ IF_GT IF_GTZ IF_LE IF_LEZ IF_LT IF_LTZ IF_NE IF_NEZ IGET INSTANCE_OF INT_TO_BYTE INT_TO_CHAR INT_TO_DOUBLE INT_TO_FLOAT INT_TO_LONG INT_TO_SHORT INVOKE_DIRECT INVOKE_DIRECT_RANGE INVOKE_INTERFACE INVOKE_INTERFACE_RANGE INVOKE_STATIC INVOKE_STATIC_RANGE INVOKE_SUPER INVOKE_SUPER_RANGE INVOKE_VIRTUAL INVOKE_VIRTUAL_RANGE IPUT LONG_TO_DOUBLE LONG_TO_FLOAT LONG_TO_INT MONITOR_ENTER MONITOR_EXIT MOVE MOVE_16 MOVE_EXCEPTION MOVE_FROM16 MOVE_OBJECT MOVE_OBJECT_16 MOVE_OBJECT_FROM16 MOVE_RESULT MOVE_RESULT_OBJECT MOVE_RESULT_WIDE MOVE_WIDE MOVE_WIDE_16 MOVE_WIDE_FROM16 MUL NEG NEW_ARRAY NEW_INSTANCE NOP NOT OR PACKED_SWITCH REM RETURN RETURN_OBJECT RETURN_VOID RETURN_WIDE RSUB SGET SHL SHL SHR SPARSE_SWITCH SPUT SPUT SUB SUB THROW USHR XOR

Links:


Perl6

NQP

https://github.com/perl6/nqp

opcodes (about 268 of them): https://github.com/perl6/nqp/blob/master/docs/ops.markdown

MoarVM