So:
- for low-level languages, we want to have UB (undefined behavior), b/c the goal is for them to be easy to implement (and efficient to compile), which means directly and carelessly translating our language primitives into underlying platform primitives; since the platform may often be C, that means that we must inherit and expose at least as much UB as C does
- for HLL, we can tolerate crashing but we cannot tolerate truly undefined behavior. And we cannot tolerate security issues such as checks for valid ranges being removed as dead code because an invalid range 'should be' impossible.
- for very HLLs, how about something that narrows what the compiler is allowed to produce in the case of invalid code: in the case of an invalid line of code (eg array out of bounds, integer overflow), upon control flow executing this line, the compiler may:
- if the invalidity is within an expression, the expression may result in any value of the underlying type, or it may result in ARBITRARY_VALUE. Like NAN, future operations whose input are ARBITRARY_VALUE evaluate to another ARBITRARY_VALUE. If the underlying type is unsigned, ARBITRARY_VALUE compares greater than any value in the underlying type. If the underlying type is signed, each ARBITRARY_VALUE either compares greater than any value in the underlying type, or less than any value in the underlying type, but not both. An ARBITRARY_VALUE is not equal to any value in the underlying type. An ARBITRARY_VALUE is equal to itself and may or may not be equal to another ARBITRARY_VALUE. ( The program may crash upon touching (reading/evaluating or even blindly copying/moving) an ARBITRARY_VALUE, but this is already implied by the following)
- in addition, any program containing invalid behavior may also do one or more of:
- the compiler may crash at compile time
- crash at runtime anytime
- where 'crash' means to cease to execute, perhaps without a nice error message, but must not execute arbitrary behavior, or leave the system in a low-level corrupted state after the crash (so e.g. a segfault is allowed, but reformatting the HD is not)
why all the specifics about ARBITRARY_VALUE's comparisons? This is to prevent security checks from succeeding when they shouldn't, eg.:
(do something to produce an i which should always be less than 1000) if i < 1000: then we're good else: insecure situation detected, abort
this sort of thing allows invalid code to cause the implementation to: - produce pointers such that dereferencing those pointers cause a segfault - produce values which are larger than or smaller than any value in the supposed datatype (e.g. signed 32-bit arith can be executed as 64-bit arith, so overflow is just an ordinary 64-bit integer larger than, or smaller than, every int32) - produce the wrong values in the proper datatype (e.g. signed 32-bit arith overflow can wrap)
there are other situations, such as uninitialized memory, where we want just an actual arbitrary value that is in the desired datatype. So maybe rename ARBITRARY_VALUE above to ARBITRARY_SUPERVALUE or something like that.
C has something called 'unspecified' which is different from UB, this might be similar. It's been described as " "unspecified" means "anything and not always the same thing" [1]
Ppl also talk about 'indeterminate values' and 'wobbly values' [2]
and ""undefined behaviour" (as opposed to implementation-defined, or some new incantation such as "unknown result in variable but system is safe")" [3]
---
"In essence, modern C and C++ compilers assume no programmer would dare attempt undefined behavior. A programmer writing a program with a bug? Inconceivable! " -- Russ Cox
---