Table of Contents for Programming Languages: a survey
continued from [1]
http://stackoverflow.com/questions/6839737/are-there-any-c-to-bytecode-compilers
VM instructions (from cint-5.18.00/cint/src/bc_inst.h https://github.com/dawehner/root/blob/master/cint/cint/src/bc_inst.h ). In some cases i've written comments on what i think these instructions might mean, these are only guesses from skimming the source code very quickly, as i couldn't find any documentation (although some instructions had code comments, which i often copied into here):
void LD(G__value* pval);
void LD(int a); # push a copy of the value found at depth a in the data stack onto the data stack
void CL(void); # clear stack pointer; Take a possible breakpoint, and then clear the data stack, and the structure offset stack.
void OP2(int opr); # one of +,-,*,/,%,@,>>,<<,&,| (applied to the data stack)
int CNDJMP(int addr=0); # if TOS == 0, then jump to specified address
int JMP(int addr=0);
void POP(void);
void LD_FUNC(const char* fname,int hash,int paran,void* pfunc, # fname is the string name of a fn; without a longer look, i can't tell if this pushes a reference to a partially applied function onto the data stack, or if it actually calls the function
struct G__ifunc_table_internal* ifunc, int ifn);
void LD_FUNC_BC(struct G__ifunc_table* ifunc,int ifn,int paran,void *pfunc);
void LD_FUNC_VIRTUAL(struct G__ifunc_table* ifunc,int ifn,int paran,void *pfunc);
void RETURN(void); # half
void CAST(int type,int tagnum,int typenum,int reftype);
void CAST(G__TypeInfo& x);
void OP1(int opr); # one of +,-
void LETVVAL(void); # something about assigning and lvalues?
void ADDSTROS(int os); # ? adds 'os' (offset) to a variable named G__store_struct_offset; dunno what this is for
void LETPVAL(void); # ? i dunno, some kind of assignment? looks at the top two(?) items on the stack and pops the top one
void TOPNTR(void); # calls G__val2pointer on the TOS
void NOT(void); # applies NOT to TOS
void BOOL(void); # i think this coerces the TOS to bool
int ISDEFAULTPARA(int addr=0); # if the stack is not empty, then JMP to addr, otherwise continue
void LD_VAR(struct G__var_array* var,int ig15,int paran,int var_type); # i dont understand this at all, but i think this is about loading a variable, which may be an array, into TOS? It first consumes a quantity of 'paran' parameters from the stack, but afaict it then ignores these? Then in the call to G__getvariable, the first parameter, which looks like it is supposed to be a variable name, is always the empty string, which looks like it should cause G_getvariable, in https://github.com/dawehner/root/blob/ed4cde23329c21e940754a56a7195ed082273229/cint/tool/ifdef/get.c , to return the empty string? So i'm confused.
void ST_VAR(struct G__var_array* var,int ig15,int paran,int var_type); #
void LD_MSTR(struct G__var_array* var,int ig15,int paran,int var_type);
void ST_MSTR(struct G__var_array* var,int ig15,int paran,int var_type);
void LD_LVAR(struct G__var_array* var,int ig15,int paran,int var_type);
void ST_LVAR(struct G__var_array* var,int ig15,int paran,int var_type);
void CMP2(int operator2);
void PUSHSTROS(void);
void SETSTROS(void);
void POPSTROS(void);
void SETTEMP(void);
void FREETEMP(void);
void GETRSVD(const char* item);
void REWINDSTACK(int rewind);
int CND1JMP(int addr=0);
private:
void LD_IFUNC(struct G__ifunc_table* p_ifunc,int ifn,int hash,int paran,int funcmatch,int memfunc_flag);
public:
void NEWALLOC(int size,int isclass_array);
void SET_NEWALLOC(int tagnum,int var_type);
void SET_NEWALLOC(const G__TypeInfo& type);
void DELETEFREE(int isarray);
void SWAP();
void BASECONV(int formal_tagnum,int baseoffset);
void STORETEMP(void);
void ALLOCTEMP(int tagnum);
void POPTEMP(int tagnum);
void REORDER(int paran,int ig25);
void LD_THIS(int var_type);
void RTN_FUNC(int isreturn);
void SETMEMFUNCENV(void);
void RECMEMFUNCENV(void);
void ADDALLOCTABLE(void);
void DELALLOCTABLE(void);
void BASEDESTRUCT(int tagnum,int isarray);
void REDECL(struct G__var_array* var,int ig15);
void TOVALUE(G__value* pbuf);
void INIT_REF(struct G__var_array* var,int ig15,int paran,int var_type);
void PUSHCPY(void);
void LETNEWVAL(void);
void SETGVP(int pushpop);
void TOPVALUE(void);
void CTOR_SETGVP(struct G__var_array* var,int ig15,int mode);
int TRY(int first_catchblock=0,int endof_catchblock=0);
void TYPEMATCH(G__value* pbuf);
void ALLOCEXCEPTION(int tagnum);
void DESTROYEXCEPTION(void);
void THROW(void);
void CATCH(void);
void SETARYINDEX(int newauto);
void RESETARYINDEX(int newauto);
void GETARYINDEX(void);void PAUSE();
void NOP(void);
// new instructions void ENTERSCOPE(void); void EXITSCOPE(void); void PUTAUTOOBJ(struct G__var_array* var,int ig15); void CASE(void* x); /* void SETARYCTOR(int num); */ void MEMCPY(); void MEMSETINT(int mode,map<long,long>& x); int JMPIFVIRTUALOBJ(int offset,int addr=0); void VIRTUALADDSTROS(int tagnum,struct G__inheritance* baseclass,int basen); void cancel_VIRTUALADDSTROS();
see also https://github.com/dawehner/root/blob/master/cint/cint/src/bc_exec_asm.h for some code that executes (some of?) these (and other things generated by the optimizer?)
" CIL (for C Intermediate Language) sits in between GENERIC and GIMPLE: control structures are kept and side-effects are hoisted out of expression trees by introducing instructions. It is originally the intermediate language used in safe compiler CCured. While GENERIC is best suited for source code syntactic analyses and GIMPLE for optimization, CIL is ideally suited for source code semantic analyses. As such, it is the frontend of many analyses for C programs, ranging from type-safe compilation to symbolic evaluation and slicing. " -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 42
"CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and a close relationship with the source program. The main advantage of CIL is that it compiles all valid C programs into a few core constructs with a very clean semantics...CIL features a reduced number of syntactic and conceptual forms. For example, all looping constructs are reduced to a single form, all function bodies are given explicit return statements, syntactic sugar like "->" is eliminated and function arguments with array types become pointers. (For an extensive list of how CIL simplifies C programs, see Section 4.) This reduces the number of cases that must be considered when manipulating a C program. CIL also separates type declarations from code and flattens scopes within function bodies. This structures the program in a manner more amenable to rapid analysis and transformation. CIL computes the types of all program expressions, and makes all type promotions and casts explicit. CIL supports all GCC and MSVC extensions except for nested functions and complex numbers. Finally, CIL organizes C’s imperative features into expressions, instructions and statements based on the presence and absence of side-effects and control-flow. Every statement can be annotated with successor and predecessor information. Thus CIL provides an integrated program representation that can be used with routines that require an AST (e.g. type-based analyses and pretty-printers), as well as with routines that require a CFG (e.g., dataflow analyses). CIL also supports even lower-level representations (e.g., three-address code), see Section 8." [2]
Links:
" C is such a simple language, an IR is fairly easy to design. Why is C so "simple"?
Only base types are several varieties of ints and floats
No true arrays — e1[e2] just abbreviates *(e1+e2); all indicies start at zero, there's no bounds checking
All parameters passed by value, and in order
Block structure is very restricted: all functions on same level (no nesting); variables are either truly global or local to a top-level function; implementation drastically simplified because no static links are ever needed and functions can be easily passed as parameters without scoping trouble.
Structure copy is bitwise
Arrays aren't copied element by element
Language is small; nearly everything interesting (like I/O) is in a library Thus, the following tuples are sufficient:
x <- y x <- y[i]
x <- &y x[i] <- y
x <- *y goto L
*x <- y if x relop y goto L
x <- unaryop y param x
x <- y binaryop z call p, n unaryop is one of: +, -, !, ~, ...
binaryop is one of: +, -, *, /, %, &, |, ^, ., &&, ||, ...
relop is one of ==, !=, <, <=, >, >=
x[i] means i memory location x + i
call p,n means call p with n bytes of arguments" -- [3]"Newspeak [96] is at the same level as MSIL, while maintaining some control structures and expression trees. These design decisions notably facilitate source code analyses. It is the intermediate language used in static analyzer Penjili." -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 42
Links:
"The CompCert? project [22] for building a verified compiler presents a chain of intermediate languages for compilation of C programs, from Clight, a large subset of C, to plain assembly, through Cminor, which is similar to MSIL and Newspeak" -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 42
Links:
"C0 is a Pascal-like subset of C, similar to MISRA-C. It excludes the features of C that make full verification problematic: pointer arithmetic, unions, pointer casts. It is the intermediate language used in the Verisoft project for pervasive formal verification of computer systems.
Simpl is a very generic Sequential IMperative Programming Language. It offers high-level constructs like closures, pointers to procedures, dynamic method invocation, all above an abstract state that can be instantiated differently depending on the language analyzed. In his PhD? thesis, Schirmer presents a two-fold embedding of a programming language in HOL theorem prover through Simpl. On the one hand, Simpl statements are deeply embedded inside HOL, which allows one to formally verify correctness of the anal- ysis on statements. On the other hand, the abstract state is shallowly embedded inside HOL, for an easier application to program analysis of many different languages. Simpl is used as target language in the C0 compiler of the Verisoft project." -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 43