Table of Contents for Programming Languages: a survey

Security languages

E

https://en.wikipedia.org/wiki/E_(programming_language)

Links:

CHERI

http://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201406-isca2014-cheri.pdf

current spec: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-891.pdf

Capability Hardware Enhanced RISC Instructions (CHERI)

" To this end, CHERI blends traditional paged virtual memory with a per-address-space capability model that includes capability registers, capability instructions, and tagged memory that have been added to the 64-bit MIPS ISA. CHERI learns from the C-language fat-pointer literature: its capabilities describe fine-grained regions of memory and can be substituted for data or code pointers in generated code, protecting data and also providing Control-Flow Integrity (CFI). Strong monotonicity properties allow the CHERI capability model to express a variety of protection properties, from valid C-language pointer provenance and bounds checking to implementing the isolation and controlled commu- nication structures required for higher-level models such as software compartmentalization.

...

CHERI sup- ports architectural least privilege through in-address-space memory capabilities , which replace integer virtual-address representations of code and data pointers to allow fine-grained protec- tion of in-memory data, and also control-flow integrity. This application of least privilege provides strong protection against a broad range of memory- and pointer-based vulnerabilities and exploit techniques – buffer overflows, format-string attacks, pointer-corruption attacks, and so on.

...

CHERI capabilities limit how pointers can be used by scoping the ranges of memory (via bounds) and operations that can be performed (via permissions).

...

combining fat pointers with capabilities, allowing capabilities to be substituted for C pointers with only limited changes to program semantics.

...

We do, however, hypothesize that some of the efficiencies and safety features found in the hardware implementation would be difficult to accomplish in a software transformation: for example, automatic and atomic clearing of tags for in-memory pointers during arbitrary memory writes might come at substantial expense in software, but are “free” in hardware.

...

capability registers describe the rights ( protection domain ) of the executing thread to memory that it can access, and to object references that can be invoked to tran- sition between protection domains. We model these registers as a separate capability register file , supplementing the general-purpose file. Over time, we imagine that capa- bility registers will displace general-purpose registers as the preferred means to describe data and object references. Capability registers contain a tag, sealed bit, permission mask, base, length, and offset (allowing the description of not just bounded regions, but also a pointer into that region, improving C-language compatibility). Capability registers are suitable for describing both data and code, and can hence protect both data integrity/con- fidentiality and control flow. Certain registers are reserved for use in exception handling; all others are available to be managed by the compiler using the same techniques used with conventional registers.

...

capability instructions allow executing code to create, constrain (e.g., by increasing the base, decreasing the length, or reducing permissions), manage, and inspect capability register values. Both unsealed (memory) and sealed (object) capabilities can be loaded and stored via memory capability registers (i.e., dereferencing); object capabilities can be invoked, via special instructions, allowing a transition between protection domains, but their fields are given additional integrity protections to provide encapsulation. Capabil- ity instructions implement guarded manipulation

: invalid capability manipulations (e.g., to increase rights or length) and invalid capability dereferences (e.g., to access outside of a bounds-checked region) result in an exception that can be handled by the supervi- sor or language runtime. A key aspect of the instruction-set design is intentional use of capabilities
: explicit capability registers, rather than ambient authority, are used to in- dicate exactly which rights should be exercised, to limit the damage that can be caused by exploiting bugs.

...

Tagged memory associates a 1-bit tag with each capability-aligned and capability-sized word in physical memory, which allows capabilities to be safely loaded and stored in memory without loss of integrity. This functionality expands a thread’s effective pro- tection domain to include the transitive closure of capability values that can be loaded via capabilities via those present in its register file. For example, a capability register representing a C pointer to a data structure can be used to load further capabilities from that structure, referring to further data structures, which could not be accessed without suitable capabilities. Writes to capability values in memory that do not originate from a valid capability in the capability-register file will clear the tag bit associated with that memory, preventing accidental (or malicious) dereferencing of invalid capabilities.

...

Each capability (whether held in a register or stored in memory) has an associated 1-bit tag that consistently tracks pointer validity. In-memory tags are maintained by the memory subsystem as 1 bit for each capability-sized capability-aligned unit of memory (either 128 bits or 256 bits, depending on the ISA variant); they are not directly addressable. Other metadata associated with capabilities (e.g., bounds and permissions) are stored in addressable memory and protected by the corresponding tag bits. Tags follow capabilities into and out of capability registers with corresponding loads and stores. Capabilities are valid for dereference – for load, store, and instruction fetch – only if the tag is set. Dereferencing an untagged capability (i.e., one without a tag set) will cause a hard- ware exception. Tagged capabilities can be constructed only by deriving them from existing tagged capabilities, which ensures pointer provenance . Attempts to overwrite all or a portion of a capability in memory will automatically (and atomically) clear the tag. For example, this prevents arbitrary data received over the network from ever being directly dereferenced as a pointer.

...

Bounds on Pointers Capabilities contain lower and upper bounds for each pointer; while the pointer may move out of bounds (and back in again), attempts to dereference an out-of-bounds pointer will throw a hardware exception

...

Bounds originate in allocation events. The operating system places bounds on pointers ...as new address-space mappings are made available (e.g., via mmap sys- tem calls).

...

Permissions on Pointers Capabilities extend pointers with a permissions mask controlling how a pointer may be used; for example, the compiler may set the permissions so that pointers to data cannot be reused as code pointers, or so that pointers to code cannot be used to store data. Further permissions control the ability to load and store capabilities themselves, allowing the compiler to implement policies such as code and data pointers cannot be loaded from character strings.

Permissions can also be made accessible to higher-level aspects of the runtime and programmer model, offering dynamic enforcement of concepts similar to const.

...

The C-language const qualifier conflates several orthogonal properties and thus can not be enforced auto- matically. Our language extensions include more constrained __input and __output qualifiers.

...

Permissions, as with bounds, are linked to allocation events. Initial permissions for memory mappings will be introduced by the kernel via process startup and as new memory mappings are introduced. The run-time linker will be responsible for creating hierarchies of executable capabilities representing code pointers, from which (implicitly), other forms of runtime ad- dresses will be derived, such as return addresses. Read-only and readable/writable capabilities will originate with the run-time linker, heap allocator, and stack allocator. A separate hierarchy of type capabilities, used in sealing, will likely be provided independently via a system-call in- terface, to prevent arbitrary code and data capabilities being used for this purpose

...

Capability Monotonicity via Guarded Manipulation A new capability can be derived from another capability only via valid manipulations that may narrow (but never broaden) rights ascribed to the new capability. This property of capability monotonicity

...

Sealed Capabilities Capability sealing allows capabilities to be marked as immutable and non-dereferenceable , causing hardware exceptions to be thrown if attempts are made to modify or dereference them, or to jump to them. This enables capabilities to be used as unforgeable tokens of authority for higher-level software constructs grounded in encapsulation , while still allowing them to fit within the pointer-centric framework offered by CHERI capabilities. Sealed capabilities are the foundation for building the CheriBSD? object-capability model supporting in-address-space compartmentalization; they can also be used to support other operating-system or language robustness features, such as representing other sorts of delegated (non-hardware-defined) rights, or ensuring that pointers are dereferenced only by suitable code.

Capability Object Types Sealed capabilities contain an additional piece of metadata, an object type , set when a memory capability undergoes sealing. Object types allow multiple sealed capabilities to be indelibly (and indivisibly) linked, so that the kernel or language runtime can avoid expensive checks (e.g., via table lookups) to confirm that they are intended to be used together. For example, for object-oriented compartmentalization models (such as the CheriBSD? object-capability model), pairs of sealed capabilities represent objects: one as the code capability for a class, and the other a data capability representing the data associated with a particular instance of an object. In the CheriBSD? model, these two sealed capabilities have the same value in their object-type field, and two candidate capabilities passed to object invocation will not be accepted together if their object types do not match. The object-type field is set based on a second input capability authorizing use of the type space – itself simply a hardware permission authorizing sealing within a range of values speci- fied by the capability’s bounds. A similar model authorizes unsealing , which permits a sealed capability to be restored to a mutable and dereferenceable state – if a suitable capability to have sealed it is held. This is used in the CheriBSD? model during object invocation to grant the callee access to its internal state. A similar model could be achieved without using an unsealing mechanism: a suitably privi- leged component could inspect a sealed capability and rederive its unsealed contents. However, authorizing both sealing and unsealing based on type capabilities allows the right to construct encapsulated pointers to be delegated, without requiring recourse to a privileged software su- pervisor at the cost of additional domain transitions – or exercise of unnecessary privilege

...

Capability Flow Control The CHERI capability model is intended to model pointers: tagged memory allows capabilities to be stored in memory, and in particular, embedded within software-managed data structures such as objects or the stack. CHERI is therefore particularly subject to a historic criticism of capability-system models – namely, that capability propagation makes it difficult to track down and revoke rights (or to garbage collect them). To address this concern, CHERI has three mechanisms by which the flow of capabilities can be constrained: Capability TLB bits extend the existing load and store permissions on TLB entries (or, in ar- chitectures with hardware page-table walkers, page-table entries) with new permissions to authorize loading and storing of capabilities. This allows the operating system to maintain pages from which tagged capabilities cannot be loaded (tags will be transpar- ently stripped on load), and to which capabilities cannot be stored (a hardware exception will be thrown). This can be used, for example, to prevent tagged capabilities from be- ing stored in memory-mapped file pages (as the underlying object might not support tag storage), or to create regions of shared memory through which capabilities cannot flow

Capability load and store permission bits extend the load and store permissions on capabil- ities themselves, similarly allowing a capability to be used only for data access – if suit- ably configured. This can be used to create regions of shared memory within a virtual address space through which capabilities cannot flow. For example, it can prevent two separated compartments from delegating access to one another’s memory regions, instead limiting communication to data traffic via the single shared region

Capability control-flow permissions “color” capabilities to limit propagation of specific types of capabilities via other capabilities. This feature marks capabilities as global or local to indicate how they can be propagated. Global capabilities can be stored via any capabil- ity authorized for capability store. Local capabilities can be stored only via a capability specifically authorized as store local . This can be used, for example, to prevent propa- gation of temporally sensitive stack memory between compartments, while still allowing garbage-collected heap memory references to be shared.

...

The decision to strip tags on load, but throw an exception on store, reflects pragmatic software utilization goals: language runtimes and system libraries often need to implement capability-oblivious memory copying , as the programmer may not wish to specify whether a region of memory must (or must not) contain capabilities). By stripping tags rather than throw- ing an exception on load, a capability-oblivious memory copy is safe to use against arbitrary virtual addresses and source capabilities – without risk of throwing an exception

...

Hybridization with Integer Pointers Processors implementing CHERI capabilities also support existing programs compiled to use conventional integer pointers rather than capabilities, using two special capabilities: Default Data Capability indirects and controls non-capability-based pointer-based load and store instructions. Program Counter Capability extends the conventional program counter with capability meta- data, indirecting and controlling instruction fetches. Programs compiled to use capabilities to represent pointers (whether implicitly or via ex- plicit program annotations) will not use the default data capability, instead employing capability registers and capability-based instructions for pointer operations and indirection. The program- counter capability will be used regardless of code model is employed, although capability- aware code generation will employ constrained program-counter bounds and permissions to implement control-flow integrity rather than using a single large code segment

...

Flow control on pointers Capability (and hence pointer) flow propagation can be limited us- ing CHERI’s information flow-control mechanism, and used to enforce higher-level poli- cies such as stack capabilities cannot be written to global data structures.

...

One key property of CHERI capabilities is that although they are designed to represent pointers, they can also be used to protect other types ...represent hardware resources such as physical addresses, in- terrupt numbers...represent language-level type information, where there is not a hardware interpretation, but unforgeable tokens are required – for example, to au- thorize use of vtables by suitable C++ objects

...

Operating systems may be modified in a number of forms to support CHERI...:

Minimally modified kernel The kernel enables CHERI support in the processor, initializes register state during context creation, and saves/restores capability state during context switches, with the goal of supporting use of capabilities in userspace
Capability domain switching in userspace...For example, new CCall and CReturn exception handlers are created, which implement kernel-managed ‘trusted stacks’ for each user thread. Access to system calls is limited to authorized userspace domains....
Fine-grained capability protection in the kernel...the kernel is extended to support fine-grained memory protection throughout its design, replacing all kernel pointers with capabilities...
Capability domain switching in the kernel...
Capability-aware system-call interface. Regardless of the kernel code generation model, it is possible to add a new system-call Application Binary Interface (ABI) that replaces conventional pointers with capabilities...
Pure-capability operating system A clean-slate operating-system design might choose to min- imize or eliminate MMU use in favor of using the CHERI capability model for all protec- tion and separation

...

CHERI capabilities may refer to regions of memory, with bounded memory access (as in segments )....Alternatively, capabilities may refer to objects that can be invoked , which allows the im- plementation of protected subsystems – i.e., services that execute in a security domain other than the caller’s. At the moment of object invocation, caller capabilities are sealed to protect them from inappropriate use by the callee, and the invoked object is unsealed to allow the object callee to access private resources it requires to implement its services

...

In CHERI, even within an address space, existing and capability-aware code can be hybridized, as reads and writes via general-purpose MIPS registers are automatically in- directed through a reserved capability register before being processed by the MMU

...

the CHERI ISA design allows software context to address memory either via legacy MIPS ISA load and store instructions, which implicitly indirect through a reserved ca- pability register configurable by software, or via new capability load and store instructions that allow the compiler to explicitly name the object to be used.

...

Capabilities Capabilities are unforgeable tokens of authority through which programs access all memory and services within an address space. Capabilities may be held in capability registers, where they can be manipulated or dereferenced using capability coprocessor instructions, or in mem- ory. Capabilities themselves may refer to memory (unsealed capabilities) or objects (sealed capabilities). Memory capabilities are used as arguments to load and store instructions, to ac- cess either data or further capabilities. Object capabilities may be invoked to transition between protection domains using call and return instructions. Unforgeability is implemented by two means: tag bits and controlled manipulation. Each capability register, and each capability-aligned physical memory location, is associated with a tag bit indicating that a capability is valid. Attempts to directly overwrite a capability in memory using data instructions automatically clear the tag bit. When data is loaded into a capability register, its tag bit is also loaded; while data without a valid tag can be loaded into a capability register, attempts to dereference or invoke such a register will trigger an exception. Guarded manipulation is enforced by virtue of the ISA: instructions that manipulate capa- bility register fields (e.g., base, offset, length, permissions, type) are not able to increase the rights associated with a capability. Similarly, sealed capabilities can be unsealed only via the invocation mechanism, or via the unseal instruction subject to similar monotonicity rules. This enforces encapsulation, and prevents unauthorized access to the internal state of objects

...

Capability Registers CHERI supplements the 32 general-purpose, per-hardware thread registers provided by the MIPS ISA with 32 additional capability registers.

There are also several implicit capability registers associated with each hardware thread, including a memory capability that corresponds to the instruction pointer, and capabilities used during exception handling. This is structurally congruent to implied registers and system con- trol coprocessor (CP0) registers found in the base MIPS ISA. Unlike general-purpose registers, capability registers are structured, consisting of a 1-bit tag and a 256-bit set of architectural fields with defined semantics and constrained values.

...

Sealed bit If the sealed bit is unset, the capability describes a memory segment that is accessi- ble via load and store instructions. If it is set, the capability describes an object capability , which can be accessed only via object invocation .

Permissions The permissions mask controls operations that may be performed using the capa- bility; some permissions are hardware-defined, controlling instructions that may be used with the capability; others are software-defined and intended for use with the object- capability mechanism.

Object type Notionally, a software-managed object type used to ensure that corresponding code and data capabilities for an object are used together correctly.

Base This is the base address of a memory region.

Length This defines the length of a memory region.

Offset A free-floating pointer that will be added to the base when dereferencing a capability; the value can float outside of the range described by the capability, but an exception will be thrown if a requested access is out of range.

Reserved fields These bits are reserved for future experimentation.

Tag bit The tag bit is not part of the base 256 bits. It indicates whether or not the capability register holds a valid capability; this allows non-capability values to be moved via ca- pability registers, making it possible to implement software functions that, for example, copy memory oblivious to capabilities being present

...

As indicated earlier, the hardware capability type may be used to support language-level types, but should not be confused with language-level types.

...

Should we provide hardware-assisted garbage collection along the lines of the Java Virtual Machine’s garbage collection model? Should we implement explicit revocation functionality, along the lines of Redell’s capability revocation scheme (effectively, a level of indirection for all capabilities, or selectively when the need for revocation is anticipated)? We have instead opted for dual semantics grounded in the requirements of real-world low- level system software: CHERI lacks a general revocation scheme; however, in coordination with the software stack, it can provide for both strict limitations on the extent of hardware- supported delegation periods, and software-supported generalized revocation using interposi- tion. The former is intended to support the brief delegation of arguments from callers to callees across object-capability invocation; the latter allows arbitrary object reference revocation at a greater price. Local Capabilities and Revocation To this end, capabilities may be further tagged as local , which allows them to be processed in registers, stored in constrained memory regions, and passed on via invocation of other objects. The goal of local capabilities is to introduce a limited form of revocation that is appropriate for temporary delegation across protected subsystem invocations, which are not permitted to persist beyond that invocation. Among other beneficial properties, local capabilities allow the brief delegation of access to arguments passed by reference, such as regions of the caller’s stack (a common paradigm in C language programming). In effect, local capabilities inspire a single-bit information-flow model, bounding the po- tential spread of capabilities for ephemeral objects to capability registers and limited portions of memory. The desired protection property can be enforced through appropriate memory man- agement by the address-space executive: that is, local capabilities can be limited to a particular thread, with bounded delegation time down the (logical) stack. 3 It has been recommended that we substitute a generalized generation count-based model for an information flow model. This would be functionally identical in the local capability case, used to protect per-stack data. However, it would also allow us to implement protection of thread-local state, as well as garbage collection, if desired. The current ISA does not yet reflect this planned change. Generalized revocation is not supported directly by the CHERI ISA; instead, we rely on the language runtime to implement either a policy of virtual address non-reuse or garbage collection . A useful observation is that address space non-reuse is not the same as memory non- reuse: the meta-data required to support sparse use of a 64-bit address space scales with actual allocation, rather than the span of consumed address space. For many practical purposes, a 64- bit address space is virtually infinite 4 , so causing the C runtime to not reuse address space is now a realistic option. Software can, however, make use of interposition to implement revocation or other more semantically rich notions of privilege narrowing, as proposed in HYDRA.

...

Notions of Privilege

Ring-based privilege is derived from the commodity hardware notion that a series of suc- cessively higher-level rings provides progressively fewer rights to manage hardware protection...Attempts to perform privileged instructions will trap to a lower ring level, which may then proceed with the operation, or reject it...The trap mechanism itself is modified in CHERI, in order to save and restore the capability register state required within the execution of each ring ...

Hardware capability context privilege is a new notion of privilege that operates within rings, and is managed by the capability coprocessor. When a new address space is instantiated, code executing in the address space is provided with adequate initial capabilities to fully manage the address space, and derive any required capabilities for memory allocation, code linking, and object-capability type management. In CHERI, all capability-related privileges are captured by capabilities, and capability operations never refer to the current processor ring to authorize operations, although violation of a security property (i.e., an attempt to broaden a memory capability) will lead to a trap, and allow a software supervisor in a lower ring to provide al- ternative semantics. This approach follows the spirit of Paul Karger’s paper on limiting the damage potential of discretionary Trojan horses [41], and extends it further.

Supervisor-enforced capability context privilege is a similar notion of privilege that may also be implemented in software trap handlers. For example, an operating system kernel may choose to accept system-call traps only from appropriately privileged userspace code (e.g., by virtue of holding a capability with full access to the userspace address space, rather than just narrow access, or that has a reserved user-defined permission bit set), and therefore can check the capability registers of the saved context to determine whether the trap was from an appropriate execution context. This might be used to limit system call invocation to a specific protected subsystem that imposes its own authorization policy on application components by safely wrapping system calls from userspace.

Reserved capability registers for exception handling New exception-handling functional- ity is required to ensure that exception handlers themselves can execute properly. We reserve several capability registers for use both by the exception-handling mechanism itself (describing the rights that the exception handler will run with) and for use by software exception handlers (a pair of reserved registers that can be used safely during context switching). This approach is not dissimilar from the current notion of exception-handling registers in the MIPS ABI, which reserves two general-purpose registers for this purpose. However, whereas the MIPS ABI simply dictates that user code cannot rely on the two reserved exception registers being preserved, CHERI requires that access is blocked, as capability registers delegate rights and also hold data. We currently grant access to exception-related capability registers by virtue of special permission bits on the capability that describe the currently executing code; attempting to access reserved registers without suitable permission will trigger an exception

Saved program-counter capability Exception handlers must also be able to inspect excep- tion state; for example, as PC , the program counter, is preserved today in a control register, EPC , the program counter capability must be preserved as EPCC so that it can be queried

...

each capability-aligned and capability-sized word in memory has an additional tag bit . The bit is set whenever a capability is atomically written from a register to an authorized memory location, and cleared if a write occurs to any byte in the word using a general-purpose store instruction. Capabilities may be read only from capability-aligned words, and only if the tag bit is set at the moment of load; otherwise, a capability load exception is thrown. Tags are associated with physical memory locations, rather than virtual ones, such that the same memory mapped at multiple points in the address space, or in different address spaces, will have the same tags. Tags require strong coherency with the data they protect, and it is expected that tags will be cached with the memory they describe within the cache hierarchy. Strong atomicity properties are required such that it is not possible to partially overwrite a capability value in memory while retaining the tag

...

Capability Instructions Various newly added instructions are documented in detail in Chapter 5. Briefly, these in- structions are used to load and store via capabilities, load and store capabilities themselves, manage capability fields, invoke object capabilities, and create capabilities

...

Object Capabilities As noted above, the CHERI design calls for two forms of capabilities: capabilities that describe regions of memory and offer bounded-buffer “segment” semantics, and object capabilities that permit the implementation of protected subsystems. In our model, object capabilities are repre- sented by a pair of sealed code and data capabilities, which provide the necessary information to implement a protected subsystem domain transition. Object capabilities are “invoked” using the CCall instruction (which is responsible for unsealing the capabilities, performing a safe security-domain transition, and argument passing), followed by CReturn (which reverses this process and handles return values).

...

In the longer term, we hope to investigate the congruence of object-capability invocation with message-passing primitives between hardware threads: if each register context represents a security domain, and one domain invokes a service offered by another domain, passing a small number of general-purpose and capability registers, then message passing may offer a way to provide significantly enhanced performance. 5 In this view, hardware thread contexts, or register files, are simply caches of thread state to be managed by the processor

...

By adopting a RISC-like approach, in which traps to a lower ring occur when hardware-supported semantics is exceeded, we will be able to supplement the hardware model through modifications to the supervisor

...

Should DMA be forced to pass through the capability equivalent of an I/O MMU in order to be appropriately constrained?...For the time being, device drivers continue to hold the privilege for DMA to use arbitrary physical memory addresses, although hybrid models – such as allowing DMA access only to specific portions of physical memory – may prove appropriate. Similar problems have plagued virtualization in commodity CPUs, where guest operating systems require DMA memory per- formance but cannot be allowed arbitrary access to physical memory

...

Capabilities can be used to implement pointers into virtual address spaces;

Tags on registers determine whether they are valid pointers for loading, fetching, or jump- ing to;

Tagged registers can contain both data and capabilities, allowing (for example) capability- oblivious memory copies;

Tags on pointer-sized, pointer-aligned units of memory preserve validity (or invalidity) across loads and stores to memory;

Tags are associated with physical memory locations – i.e., if the same physical memory is mapped at two different virtual addresses, the same tags will be used;

Attempts to store data into memory that has a valid tag will atomically clear the tag;

Capability loads and stores to memory offer strong atomicity with respect to capability values and tags preventing races that might yield combinations of different capability values, or the tag remaining set when a corrupted capability is reloaded;

Pointers are extended to include bounds and permissions; the “pointer” is able to float freely within (and to varying extents, beyond) the bounds;

Permissions are sufficient to control both data and control-flow operations;

Guarded manipulation implements monotonicity: rights can be reduced but not increased through valid manipulations of pointers;

Invalid manipulations of pointers violating guarded-manipulation rules lead to an ex- ception or clearing of the valid tag, whether in a register or in memory, with suitable atomicity;

Loads via, stores via, and jumps to capabilites are constrained by their permissions and bounds, throwing exceptions on a violation;

Capability exceptions, in general, are delivered with greater priority than MMU excep- tions;

Permissions on capabilities include the ability to not just control loading and storing of data, but also loading and storing of capabilities;

Capability-unaware loads, stores, and jump operations via integer pointers are constrained by implied capabilities such as the Default Data Capability and Program Counter Capa- bility, ensuring that legacy code is constrained;

If present, the Memory Management Unit (MMU), whether through extensions to software- managed Translation Lookaside Buffers (TLBs), or via page-table extensions for hardware- managed TLBs, contains additional permissions controlling the loading and storing of capabilities;

Strong C-language compatibility is maintained through definitions of NULL to be un- tagged, zero-filled memory, instructions to convert between capabilities and integer point- ers, and instructions providing C-compatible equality operators;

Reserved capabilities, whether special registers or within a capability register file, allow a software supervisor to operate with greater rights than non-supervisor code, recovering those rights on exception delivery;

A simple capability control-flow model to allow the propagation of capabilities to be constrained;

Sealed capabilities allow software-defined behaviors to be implemented, along with suit- able instructions to (with appropriate authorization) seal and unseal capabilities based on permissions and object types;

By clearing hardware-defined permissions, and utilizing software-defined permissions, capabilities can be used to represent spaces other than the virtual address space;

For compressed capabilities, pointers can stray well out-of-bounds without becoming unrepresentable;

For compressed capabilities, alignment requirements do not restrict common object sizes and do not restrict large objects beyond common limitations of allocators and virtual memory mapping; and

That through inductive properties of the instruction set, from the point of CPU reset, via guarded manipulation, and suitable firmware and software management, it is not possible to “forge” capabilities or otherwise escalate privilege other than as described by this model and explicit exercise of privilege (e.g., via saved exception-handler capabilities, unsealing, etc).

...

Each capability register also has an associated tag indicating whether it currently contains a valid capability. Any load and store operations via an invalid capability will trap

...

registers:

PCC: Program counter capability (PCC): the capability through which PC is indirected by the processor when fetching instructions.

DDC: ( C0 ) Capability register through which all non-capability load and store instructions are indirected. This allows legacy MIPS code to be controlled using the capability coprocessor.

C1: ... C25 General-purpose capability registers referenced explicitly by capability-aware instructions.

IDC: ( C26 ) Invoked data capability: the capability that was unsealed at the last protected procedure call. This capability holds the unlimited capability at boot time.

KR1C: ( C27 ) A capability reserved for use during kernel exception handling.

KR2C: ( C28 ) A capability reserved for use during kernel exception handling.

KCC: ( C29 ) Kernel code capability: the code capability moved to PCC when entering the kernel for exception handling.

KDC: ( C30 ) Kernel data capability: the data capability containing the security domain for the kernel exception handler.

EPCC: ( C31 ) Capability register associated with the exception program counter ( EPC ) required by exception handlers to save, interpret, and store the value of PCC at the time the exception fired.

...

C1 - C2 are caller-save. During a cross-domain call, these are used to pass the PCC and IDC values, respectively. In the invoked context, they are always available as tempo- raries, irrespective of whether the function was invoked as the result of a cross-domain call.

C3 - C10 are used to pass arguments and are not preserved across calls.

C11 - C16 are caller-save registers.

C17 - C24 are callee-save registers.

C25 is currently unused. In all ABIs, the following convention also applies:

C3 optionally contains a capability returned to a caller (congruent to MIPS $v0 , $v1 ). The pure-capability ABI, used within compartments or for pure-capability (“CheriABI“) appli- cations, implements the following further conventions for capability use:

C11 , in the pure-capability ABI, contains the stack capability (congruent to MIPS $sp ).

C12 , in the pure-capability ABI, contains the jump register (congruent to MIPS $t9 ).

C17 , in the pure-capability ABI, contains the link register (congruent to MIPS $ra ).

When calling (or being called) across protection domains, there is no guarantee that a non- malicious caller or callee will abide by these conventions. Thus, all registers should be re- garded as caller-save, and callees cannot depend on caller-set capabilities for the stack and jump registers.

. Additionally, all capability registers that are not part of the explicit argument or return-value sets should be cleared via explicit assignment or via the CClearHi? and CClearLo? instructions.

...

Where rights are explicitly passed between do- mains, it may be desirable to clear the global bit that will (in a suitably configured runtime) limit further propagation of the capability.

...

Cross-Domain Procedure Calls A cross-domain procedure call, instruction CCall , escapes to a handler that takes a sealed executable (“code”) and sealed non-executable (“data”) capability with matching types. If the types match, the unsealed code capability is placed in PCC and the unsealed data capability is placed in IDC . The handler will also push the previous PCC , IDC , PC + 4, and SP to a stack pointed to by TSC . The stack pointer TSC may be implemented either as a hardware register or as a variable internal to a software implementation of CCall . The caller should invalidate all registers that are not intended to be passed to the callee before the call. A cross-domain procedure return, instruction CReturn , also escapes to a handler that pops the code and data capabilities from the stack at TSC and places them in PCC and IDC , re- spectively; it likewise pops PC and SP . The callee should invalidate all registers that are not intended to be passed to the caller before the return.

The caller is responsible for ensuring that its protection domain is entirely embodied in the capability in IDC so that it can restore its state upon return.

...

These semantics are software defined, and we anticipate that different operating-system and programming-language security models might handle these, and other behaviors, in different ways. For example, in our prototype CheriBSD? implementation, the operating-system kernel maintains a “trusted stack” onto which values are pushed during invocation, and from which values are popped on return

...

KCC and KDC hold the code capability and data capability that describe the protection domain of the system exception handler. When an exception occurs, the victim PCC is copied to EPCC so that the exception may return to the correct address, and KCC , excepting its offset field, which will be set to the appropriate MIPS exception-vector address, is moved to PCC to grant execution rights for kernel code. When an exception handler returns with ERET , EPCC , possibly after having been updated by the software exception handler, is moved into PCC . KDC may be manually installed by the exception handler if needed, and will typically be moved into DDC in order to allow otherwise unmodified MIPS exception handlers to be used. This may also need to be restored before returning from the exception

Capabilities Each capability register contains the following fields:

Tag bit (“ tag ”, 1 bit)
Sealed bit (“ s ”, 1 bit)
Permissions mask (“ perms”)
User-defined permissions mask (“ uperms ”)
Object type (“ otype ”, 24 bits)
Offset (“ offset ”, 64 bits)
Base virtual address (“ base ”, 64 bits)
Length in bytes (“ length ”, 64 bits)

...

Sealed Bit The s flag indicates whether a capability is usable for general-purpose capability operations. If this flag is set, the capability is sealed and it may be used only by a CCall instruction. If the CCall instruction receives a sealed executable capability and a sealed non-executable capability with matching otype fields, both capabilities will have their s flag cleared and control flow branches to the code indicated by the code capability, thus entering a new security domain.

...

Permission Bits The perms bit vector governs the permissions of the capability including read, write and ex- ecute permissions. Bits 0–7 and 10 of this field, which control use and propagation of the capability, and also limit access to exception-handling features, are described in Table 5.2.

User-Defined Permission Bits The uperms bit vector may be used by the kernel or application programs for user-defined permissions. They can masked and retrieved using the same CAndPerm? and CGetPerm? instruc- tions that operate on hardware-defined permissions, and also be checked using the CCheckPerm? instruction. When using 256-bit capabilities, 16 user-defined permission bits are available; with 128-bit capabilities, 4 user-defined permission bits are available

User-defined permission bits can be used in combination with existing hardware-defined permissions (e.g., to annotate code or data capabilities with further software-defined rights), or in isolation of them (with all hardware-defined permissions cleared, giving the capability only software-defined functionality). ...

Object Type Th 24-bit otype field holds the “type” of a sealed capability; this field allows an unforgeable link to be created between associated data and object capabilities. While earlier versions of the CHERI ISA interpreted this field as an address, recent versions treat this as an opaque software-managed value.

...

Capability Permissions Table 5.2 shows the definition of bits 0–7 of the perms field.

Global: Allow this capability to be stored via capabilities that do not themselves have Permit Store Local Capability set.
Permit Execute: Allow this capability to be used in the PCC register as a capability for the program counter.
Permit Store Capability: Allow this capability to be used as a pointer for storing other capa- bilities.
Permit Load Capability: Allow this capability to be used as a pointer for loading other capa- bilities.
Permit Store: Allow this capability to be used as a pointer for storing data from general- purpose registers.
Permit Load: Allow this capability to be used as a pointer for loading data into general- purpose registers.
Permit Store Local Capability: Allow this capability to be used as a pointer for storing local capabilities.
Permit Seal: Allow this capability to be used to seal or unseal capabilities that have the same otype
((bit 10?)) Access System Registers: Allow access to EPCC , KDC , KCC , KR1C and KR2C when this capability is in PCC

...

The Permit Store Local Capability permission bit is used to limit capability propagation via software-defined policies: local capabilities (i.e., those without the Global permission set) can be stored only via capabilities that have Permit Store Local Capability set. Normally, this permission will be set only on capabilities that, themselves, have the Global bit cleared. This allows higher-level, software-defined policies, such as “Disallow storing stack references to heap memory” or “Disallow passing local capabilities via cross-domain procedure calls,” to be implemented. We anticipate both generalizing and extending this model in the future in order to support more complex policies – e.g., relating to the propagation of garbage-collected pointers, or pointers to volatile vs. non-volatile memory.

...

When the cause of an exception is that the attempted operation is prohibited by the capability system, the ExcCode? field within the cause register of coprocessor 0 are set to 18 ( C2E , coprocessor 2 exception), PCC and EPCC are set as described in Section 5.5 and capcause is set as described below.

...

Table 5.3: Capability Exception Codes

Value Description 0x00: None 0x01: Length Violation 0x02: Tag Violation 0x03: Seal Violation 0x04: Type Violation 0x05: Call Trap 0x06: Return Trap 0x07: Underflow of trusted system stack 0x08: User-defined Permission Violation 0x09: TLB prohibits store capability 0x0a: Requested bounds cannot be represented exactly 0x0b: reserved 0x0c: reserved 0x0d: reserved 0x0e: reserved 0x0f: reserved 0x10: Global Violation 0x11: Permit Execute Violation 0x12: Permit Load Violation 0x13: Permit Store Violation 0x14: Permit Load Capability Violation 0x15: Permit Store Capability Violation 0x16: Permit Store Local Capability Violation 0x17: Permit Seal Violation 0x18: Access System Registers Violation 0x19: reserved 0x1a: reserved 0x1b: reserved 0x1c: reserved 0x1d: reserved 0x1e: reserved 0x1f: reserved

If the last instruction to throw an exception did not throw a capability exception, then the ExcCode? field of capcause will be None .

ExcCode? values from 128 to 255 are reserved for use by application programs. (A program can use CSetCause? to set ExcCode? to a user-defined value).

The RegNum? field of capcause will hold the number of the capability register whose per- mission was violated in the last exception, if this register was not the unnumbered register PCC . If the capability exception was raised because PCC did not grant access to a numbered reserved register, then capcause will contain the number of the reserved register to which ac- cess was denied. If the exception was raised because PCC did not grant some other permission (e.g., permission to read capcause was required, but not granted) then RegNum? will hold 0xff. ...If an exception is caused by the combination of the values of a capability register and a general purpose register (e.g., if an expression such as clb t1, t0(c0) raises an exception because the offset t0 is trying to read beyond c0 ’s length), the number of the capability register (not of the general-purpose register) will be stored in capcause . ...

The CGetCause? instruction can be used by an exception handler to read the capcause register. CGetCause? will raise an exception if PCC . perms . Access System Registers is not set, so the operating system can prevent user space programs from reading capcause directly by not granting them Access System Registers permission ... All capability exceptions (C2E) have higher priority than address error exceptions (AdEL?, AdES?)....Some of these priority rules are security critical. In particular, an exception caused by a register being reserved must have priority over other capability exceptions (e.g., AdEL? and AdES?) to prevent a process from discovering information about the contents of a register that it is not allowed to access. Other priority rules are not security critical, but are defined by this specification so that exception processing is deterministic.

...

In the current hardware implementation of CHERI, the CCall and CReturn instructions al- ways raise an exception, so that the details of the call or return operation can be implemented in software by a trap handler.

...

CPU Reset When the CPU is hard reset, all capability registers will be initialized to the following values:

The tag bit is set.
The s bit is unset.
offset = 0 (except for PCC . offset , which will be initialized to the boot vector address)
base = 0
length = 2 64
otype = 0
All available permission bits are set. (When the 256-bit capability representation is used, 31 permission bits are available, including 16 user-defined permissions. When the 128- bit capability representation is used, 19 permission bits are available, including 4 user- defined permissions; the remaining 12 user-defined permissions at the most-significant end of the permissions field are not available, and are set to zero. Permission bits 8 and 9 are currently reserved for future use; these are included in the the 31 (or 19) permission bits that are set on reset).

All unused bits are cleared. The initial values of PCC and KCC will allow the system to initially execute code relative to virtual address 0 . The initial value of DDC will allow general-purpose loads and stores to all of virtual memory for the bootstrapping process.

...

Jump and link register: After a jalr instruction, the return address is relative to PCC.base

...

Figure 5.2: 256-bit memory representation of a capability

unused/reserved: *8 bits)
s: 1 bit
otype (24 bits)
uperms (16 bits)
perms (15 bits)
cursor/offset (64 bits)
base (64 bits)
length (64 bits)

.... Potential Future Changes to the CHERI ISA ...

CMove ... CCMove conditional-move instruction ... Provide a separate instruction for clearing the global bit on a capability. Global is cur- rently treated as a permission, but it is really an information flow label rather than a permission. We may want to allow clearing the global bit on a sealed capability, which would be easiest to implement with a separate instruction, as permissions cannot be changed on sealed capabilities ... Provide an “import capability” instruction that takes as arguments an untagged capability cb and a tagged capability ct . The result is cb with the tag bit set, provided that the 90 access granted by the result is a subset of the access granted by ct ; an exception is raised otherwise. ... Provide a CFromInt? instruction that copies a general-purpose register into the offset field of a capability register, clearing all the other fields of the capability – including the tag bit. ... Provide a conditional branch instruction that branches depending on whether a capability is equal or not equal to NULL. Checking the tag bit with CBTU is not the same as checking for equality with NULL. In the current ISA, several instructions are needed to do the latter ... Consider adding a CTailCall? variation that will not push an additional frame onto the trusted stack ... Investigate a three-capability variation on object capabilities for the 128-bit version of CHERI. This would provide more bits to be used in describing classes and objects, and avoid requiring storing the object type in pointer bits.

Extend the function of the system permission bit(s) to cover not just access to exception- related capabilities, but also other privileged aspects of the underlying ISA. For example, the permission bit could also control use of MIPS CP0 registers, TLB manipulation in- structions, and exception-related instructions. This would allow compartmentalization of code within privileged rings on MIPS.

Introduce a Perm Unseal permission that can be used to unseal sealed capabilities of a type – without necessarily authorizing sealing.

Introduce support for a userspace exception handler for CCall and CReturn , allowing more privileged user code (rather than kernel code) to implement the semantics of domain switching, provide memory for use in trusted stacks (if any), and so on. This would allow application environments to provide their own object models without needing to depend on highly privileged kernel code. ...

Figure 6.3: Capability coprocessor instruction summary

((note: you can't modify a sealed capability, except to unseal it)) ((note: the notes below on what requires what capability is non-exhaustive; i just wrote down a few that i happened to see)

CGet(Base

Offset

Len

Tag

Sealed

Perm

Type): Move x to a general-purpose register

CToPtr?: Capability to pointer
CPtrCmp?: Compare capability pointers
CClearRegs?: Clear multiple registers
CIncOffset?: Increase offset
CSetBounds?: Set bounds
CSetBoundsExact? Set bounds exactly
CSetLen?: Instruction removed
CClearTag?: Clear the tag bit
CAndPerm?: Restrict permissions
CSetOffset?: Set cursor to an offset from base
CGetPCC?: Move PCC to capability register
CGetPCCSetOffset?: Get PCC with new offset
CFromPtr?: Create capability from pointer (and capability)
CSub: Subtract capabilities
CLC/CSC: Load/Store capability register (requires load/store capability capability; also store requires either 'global' or 'permit_store_local')
CL[BHWD][U]/CS[BHWD]: Load via capability register (requires load/store capability)
CLLC/CSSC: Load/store linked capability via capability register (like load/store capability but marks the memory location upon load so that a matching CSCC checks if the memory location has changed in the meantime; for concurrency)
CLL[BHWD][U]/CSC[BHWD]: Load/store linked via capability register (like CLL/CSC)
CBT(U

S): Branch if capability tag is (unset

set)

CJ(AL)R: Jump (and link) capability register (requires execute capabilty)
CCheck(Perm

Type): Raise exception (on insufficient permission

if object types do not match)

CSeal (requires Permit_seal on the address being sealed to (the otype)), CUnseal (requires Permit_seal on the otype)
CCall (requires two capabilities, both sealed to the same otype, one with Permit_execute and one without), CReturn
CGetCause?: Move the capability exception cause register to a general-purpose register
CSetCause?: Set the capability exception cause register "

" Table 1: CHERI instruction-set extensions

CGetBase? Move base to a GPR CGetLen? Move length to a GPR CGetTag? Move tag bit to a GPR CGetPerm? Move permissions to a GPR CGetPCC? Move the PCC and PC to GPRs CIncBase? Increase base and decrease length CSetLen? Set (reduce) length CClearTag? Invalidate a capability register CAndPerm? Restrict permissions CToPtr? Generate C0-based integer pointer from a capability CFromPtr? CIncBase? with support for NULL casts CBTU Branch if capability tag is unset CBTS Branch if capability tag is set CLC Load capability register CSC Store capability register CL[BHWD][U] Load byte, half-word, word or double via capability register, (zero-extend) CS[BHWD] Store byte, half-word, word or double via capability register CLLD Load linked via capability register CSCD Store conditional via capability register CJR Jump capability register CJALR Jump and link capability register

"-- https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201406-isca2014-cheri.pdf

---

note that 'global' is on the capability being stored, not the authorizing capability (all of the others are on the authorizing capability)

although the 'otype' in a sealed capability is calculated as a base+offset during sealing, the address which is jumped to upon CCall is not necessarily the same as the otype address

CHERI is a set of extensions that add a capability model to memory. The hardware supports a capability type that is protected by a non-addressable tag bit when stored in memory. A capability grants rights to a range of an address space (e.g. read, write, and / or execute permissions). Every memory access (load, store, instruction fetch) must be authorised by a capability, which must be presented for the operation to succeed. For new load and store instructions, the base operand is a capability in a general-purpose capability register. For legacy instructions, the capability is an implicit default data capability.

In CHERI C/C++, every pointer is lowered by the compiler to be represented by a capability. This means that you cannot access any memory except via a pointer that was created by something holding a more powerful capability. For example, the compiler will derive bounded capabilities to from the stack capability register for stack allocations. The OS will return a capability in response to mmap system calls, which the memory allocator will then hold and subdivide to hand out object-bounded capabilities in response to malloc. This means that you cannot forge a pointer and you cannot ever access out of bounds of an object (guaranteed by the hardware). With our temporal safety work, you also cannot access an object that has been freed and reallocated (guaranteed by software, accelerated by some hardware features). To be able to compiler for this kind of target, the compiler must maintain the integer/pointer distinction all of the way through to the final machine-code lowering (arithmetic on pointers uses different instructions to arithmetic on integers, for example).

The CHERI extensions are designed to be architecture neutral. We originally prototyped on MIPS and are now dong RISC-V prototyping and are in the early stages of an official CHERI RISC-V extension. Morello is a localisation to AArch64 and Arm has produced a few thousand test chips / dev systems based on the Neoverse N1 for software development and experimentation. This is what I have under my desk: a modified quad-core Neoverse N1 running at 2.5GHz with 16 GiB? of RAM in a desktop chassis with a 250 GiB? SSD. We also have a load of them in a rack for CI (snmalloc CI on Morello and benchmarking.

If all goes well, I expect to see CHERI extensions on mainstream architectures on the next few years and so developing a compiler toolchain based on an abstraction that can’t possibly support them without significant rearchitecting seems like an unfortunate decision, especially when maintaining a separate pointer type is fairly simple if you design it in from scratch. The fact that LLVM had separate integer and pointer types in IR made the original CHERI work feasible, the fact that it loses that distinction by the time it reaches SelectionDAG? and the back end (one of the first questions the target-agnostic code generator asks the back end is ‘so, which integer type do you want to use for pointers?’) made it harder.

" -- https://lobste.rs/s/yqakzf/qbe_1_0#c_dfwqit

old CHERI 256-bit fat pointer format (from https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/2019tc-cheri-concentrate.pdf ):

object type ("otype"): 24 bits
permissions: 31 bits
sealed: 1 bit
object length: 64 bits
base: 64 bits
pointer address: 64 bits

new CHERI 128-bit fat pointer format (from https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/2019tc-cheri-concentrate.pdf ):

permissions: 15 bits
not sure what the semantics here are, [1] says "supports permissions bits on capabilities for read, write, and execute, and to limit capability propagation, with a few permission bits reserved for software interpretation". Maybe see here? https://cnlelema.github.io/memo/en/cheri/cheri-domain/cheri-perm/ -- that document notes 11 defined permissions: global, execute, load, store, load_cap, store_cap, store_local_cap, seal, ccall, unseal, system_regs, although it notes some uncertainty about "unseal". What are the semantics of these?
[2] says "For compartmentalization and fast domain crossing, capabilities can be made immutable and non-dereferenceable (“sealed capabilities”) by using another capability as a key, and the sealed capability is given an Object Type (otype) from the key. Sealed capabilities can be unsealed only by the key or by secure Capability Calls (CCalls). Calling into another domain requires a pair of otype-matched sealed code and data capabilities of the callee to enter the secure domain. Upon a CCall, the sealed pair is atomically unsealed and installed, transferring control to the new domain. The immutability and non-dereferenceability also guarantee that the caller cannot tamper with callee’s state using sealed capabilities." Sealed capabilities can be unsealed only by the key or by secure Capability Calls (CCalls)."
this explains many of the permission types: https://msrc-blog.microsoft.com/2022/09/06/whats-the-smallest-variety-of-cheri/
other stuff: 49 bits (I think this might be dropping the "object type" information completely but representing the sealed, object length, and base information with some loss of precision?)
no, maybe otype is still there: "Jon did some amazing work based on the LowFAT? pointer to trim capabilities down from 256 bits to 128, including a full base, top, 64-bit address, permissions, and otype for sealing." -- [3]
pointer address: 64 bits

special instruction types (from https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/2019tc-cheri-concentrate.pdf ):

load/store capability
cap pointer arithmetic
jump to capability
get special capability
compare capabilities
set capability bounds
read a capability field

some more CHERI stuff:

[4]
- https://lobste.rs/s/ez2wph/how_hard_is_it_adapt_memory_allocator

the CHERI api is here: CHERI C/C++ Programming Guide section 7 page 28, "C APIS TO GET AND SET CAPABILITY PROPERTIES"

https://tratt.net/laurie/blog/2023/two_stories_for_what_is_cheri.html

https://capabilitiesforcoders.com/

Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture (Version 9)

---

SAFE

http://www.crash-safe.org/ http://www.crash-safe.org/assets/ieee-hst-2013-paper.pdf

---

http://www.animats.com/papers/verifier/verifiermanual.pdf

PASCAL-F verifier

Scott D. Johnson and John Nagle's Pascal-F Verifier language:

http://www.animats.com/papers/verifier/verifiermanual.pdf

" The information stored for each program unit is

The name, formal argument list, and result type
The ENTR Ya nd EXIT assertions
The INVARIANT assertions
The list of global variables referenced
The list of global variables altered
The list of all routines called, and the arguments to each call
Information concerning multiprogramming "
ASSERT (<Boolean expression list> ) ;
STATE (<Boolean expression list> ) ; ((i don't understand how this is different from ASSERT?))
MEASURE ( <numeric expression> )

"Every WHILE and REPEAT loop (except for deliberate infinite loops, which must begin with "WHILE true DO" or end with ‘‘UNTIL f alse;’ ’) must contain a MEASURE statement. The MEASURE statement must immediately follo wt he ST AT Es tatement associated with the loop. The <numeric e xpression> is a limit on the number of iterations of the loop body left to be performed. More specifically ,the Verifier will insist that the value of the expression

be greater than or equal to zero when control reaches the MEASURE statement, and
decrease by at least one (for inte gers) or the precision of the expression (for fix ed point numbers) each time control flows from the MEASURE statement around the loop back to the STATE statement. How complex a MEASURE statement is required is a function of how obvious it is that the loop always terminates. FOR loops always terminate, and no MEASURE statement is used. "

2.3 "New expressions The Verifier allows the use of some new constructs for building expressions."

DEFINED(<variable>) DEFINED(<array>,<low bound>,<high bound>) DEFINED(<block name>)

The OLD annotation:

"At the beginning of every procedure and function, the Verifier implicitly saves the value at entry of e very variable used as an input to the routine. In assertions, it is possible to refer to these values by placing an .OLD follo wing a variable name or selector expression." eg "

TYPE smallint = 1..100;
PROCEDURE bump(VAR x:  smallint);
ENTRY x < 100;
EXIT x = x.OLD + 1;

IMPLIES: "Informally, the expression p IMPLIES q means, "if p is TRUE, then q will be TRUE as well... Because it happens to be the case that FALSE <= TRUE, every IMPLIES operator in Pascal-F can be replaced by the operator <= without changing the meaning of the program."

"The assertions given in an INVARIANT declaration are required to hold at multiple points in a program...if an assertion P is declared to be an INVARIANT of a construct, P must be true whenever control passes into or out of that construct."

2.3.1 "Verification declarations To v erify a program, more information about each routine is needed than what is necessary to compile a program. The Verifier expects routine headers that are more comple xt han those of standard P ascal. The extended syntax is:

<block> ::= <entry declaration part>
<exit declaration part>
<effect declaration part>
<invariant declaration part>
<depth declaration part>
<constant declaration part>
<type declaration part>
<variable declaration part>
<statement part>

Each declaration part is either empty ,orc onsists of a k eywo rd followed by a series of declarations, each of which is followed by a semicolon. "

"The following example shows ho wO LD, ENTR Y, E XIT ,a nd EFFECT are used to specify a module.

MODULE stack; EXPORTS size, push, pop; VAR stp: 0..maxsize; buf: array [1..maxsize] of stackelt;

EFFECT size = 0;

INVARIANT FOR x: 1..maxsize ALL x<stp IMPLIES DEFINED(buf[x]);

PROOF FUNCTION size: 0..maxsize; EXIT size=stp; BEGIN size := stp END;

PROCEDURE push(x: stackelt); ENTRY size<maxsize; EFFECT size = OLD.size + 1; BEGIN stp:= stp + 1; buf[stp] := x END;

FUNCTION pop: stackelt; ENTRY size >0; EFFECT size = OLD.size - 1; BEGIN pop := buf[stp]; stp := stp - 1 END;

BEGIN stp := 0 END;

" The DEPTH declaration

A recursive routine R1 is one that can call R2, which in turn calls R3, and so on until R1 is called again. A special case of recursion is a routine that calls itself. When a program uses recursion, it is necessary to prove that it is impossible to initiate an endless sequence of routine calls, none of which ever returns. Every recursi ve r outine must contain a DEPTH declaration of the form: DEPTH <integer expression>; By the scope restrictions of P ascal-F ,the integer expression may only contain parameters of and variables global to the routine. Note that a fixed point expression cannot be used. The DEPTH expression is an indication of ho wm uch time will be used by the routine. The ENTR Ya ssertion for the routine must be strong enough to imply that the DEPTH expression is nonne gative w hene verthe routine is called. From a recursi ve r outine R1 tw ok inds of calls to other routines are possible. If R1 calls itself, or if it calls a routine R2 that can initiate a chain of routine calls that e ventually leads to R1 being called again, the call is said to be a recursi ve c all. Otherwise, the call is said to be nonrecursi ve. Where verar ecursi ve r outine R1 mak es a recursi ve c all to a routine R2 (a special case is when R1 and R2 are the same routine), it must be established that an infinite chain of calls is not being initiated. Let d1 be the value (at the time R1 was called) of the DEPTH expression declared for R1. Similarly ,let d2 be the v alue (at the time R2 is being called) of the DEPTH e xpression declared for R2. The Verifier will attempt to pro ve t hat d2 is strictly less than d1. If this condition is pro vedf or e very recursi ve c all, then each call in a chain must perform a successi vely easier task, so that e very use of recursion must eventually terminate. "

EXTRA variables, functions, procedures: things not necessary to execute the program, but used in proving.

PROOF statements are statements that manipulate EXTRA variables. "The multiprogramming statements WAIT,SEND, and INIT are forbidden in PROOF statements"

"RULE functions: RULE FUNCTION <function definition> BEGIN END; Functions may be declared as RULE functions for use in proof rules. Such function definitions ha ve no body.

Rule functions are used when an expression is needed in an assertion b ut the needed expression cannot be written as a simple Pascal-F e xpression. The use of rule functions is co vered in detail in the chapter on rules.

Rule functions are restricted to results of types integer ,char, and boolean. Ar guments to rule functions may be declared as an yv alid Pascal-F type. "

" 2.4.1 Restrictions on program structure

The only legitimate ways to write non-terminating loops for a process is WHILE true DO BEGIN <loop body> END; or REPEAT <loop body> UNTIL true; This construct may not appear within an yo ther control structure. This restriction is imposed so that the V erifier can discriminate between accidental and deliberate non-terminating loops.
The iteration variable of a FOR loop may not be modified from within the loop. Further ,t he value of the iteration variable is not DEFINED after the loop terminates. The Ve rifier issues an error message whene verav ariable that is not DEFINED is accessed.

2.4.2 Restrictions on variant records. Va riant records are not permitted by the verifier.

2.4.3 Restrictions on exception handling

Pascal-F incorporates a powerful e xception handling mechanism similar to that in Ada. This version of the Verifier operates on the assumption that e xception handlers are not used in normal operation and enforces the following rules.

The RAISE statement, for exception handling, is considered to be a mechanism for reco vering from hardw are errors only .T he Verifier will try to pro ve t hat no RAISE statement is e vere xecuted.
Exception handling routines are not examined. "

" Aliasing and side effects are tw or elated phenomena that mak ep rograms difficult to understand and v erify .A liasing is the condition in which tw on ames refer to the same variable. A side effect occurs when a function modifies one of its parameters or a global variable. The Ve rifier enforces restrictions that pre vent variables from being changed in a fashion that it cannot detect, or in such a f ashion that the resulting value of the modified variable depends of the order of e valuation.

...

Because this sort of thing ((aliasing)) is far more often a cause of error than a useful feature of the language, it is prohibited.

When a function is called, V AR parameters of the function or variables global to the function can be modified. The modification of a variable by a function is called a "side effect" because the modification is usually not the primary purpose of the statement in which the function is called. Generally ,p rogramming with side effects is a dangerous practice because it is easy to forget that the side effect will tak ep lace. This danger is diminished when using the V erifier ,s ince the V erifier will detect side effects.

...

The Verifier allows statements to have s ide effects only if it can determine that the results of the side ef fects are independent of the order of e valuation in expressions.

...

The programmer can use the following simple rules to a void problems with aliasing and side effects.

1. Do not pass the same variable to tw od ifferent VAR parameters of a routine.

2. Do not pass a variable or an yc omponent thereof to a V AR parameter of a routine that sets or uses the variable as a global.

3. W ithin a function, do not modify V AR parameters or global variables, perform WA IT statements, or call a routine e xported from a monitor ,u nless the function is called only in simple assignment statements.

4. Do not use functions exported from monitors in expressions; use them only as the right side of simple assignment statements.

" 2.4.5 Restrictions on multiprogramming

The restrictions required to mak ep rograms with multiple processes v erifiable are some what se vere. Unlik et he other restrictions, which generally prohibit only language forms of little if an yr eal use, the multiprogramming restrictions can be difficult to li ve with. The object of these restrictions, as discussed earlier in this chapter ,istoa llo wt he Ve rifier (and the programmer) to generally ignore the f act that semi-concurrent operations are taking place.

The major restriction required to mak em ultiprogramming work is to require that an y code referencing a static variable must be at the same priority as the v ariable. In Pascal- F, t his is easy to enforce, because both the priority of both code and static variables is determined strictly by the priority of the module in which the ya re enclosed. The rules are as follows

No variable may be imported or exported from a monitor.
A routine that has been exported from the monitor may not be called before the monitor has been initialized with an INIT statement.
A monitor variable may not be passed as a V AR argument to a routine outside the monitor .( There are no restrictions on arguments passed by value.)
A signal may not be a component of any other type

...

When processes at dif ferent priorities must communicate, one process will ha ve t o call a routine in another module to access an ys hared v ariables. This restriction is an incenti ve t or educe access to shared variables. "

" When progressing through an array with a loop, use a FOR loop unless early e xit from the loop is planned. The Verifier supplies the proofs that FOR loops terminate. "

" The Verifier can mak eb etter deductions about addition and subtraction than it can about multiplication and di vision. It will be quite difficult to verify anything that depends on more than the most ob vious properties of multiplication and di vision. Multiplication by constants is not a problem. The Verifier can deduce the possible range of the result of a division, but little else.

Av o id scaling fixed-point operations so that truncation occurs. The Verifier can deduce the possible range of the result of a truncation, but little else. "

---

Languages for verification

BoogiePL

https://github.com/boogie-org/boogie

"...generic intermediate languages for deductive verification. None of them is specifically tailored for the analysis of C, although BoogiePL? is the intermediate language used in C analysis tools HAVOC [47] and VCC [43]. BoogiePL? and WHY both explicitate memory as a collection of heap variables, which is necessary for deductive verification, but cumbersome for most static analyses. " -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 43

Mike Barnett, Bor yuh Evan Chang, Robert Deline, Bart Jacobs, and K. Rustanm. Leino. Boogie: A modular reusable verifier for object-oriented programs. In Formal Methods for Components and Objects: 4th International Symposium, FMCO 2005, volume 4111 of Lecture Notes in Computer Science , pages 364–387. Springer, 2006

Why

The Why platform is a program verification platform.

It supports C and Java programs, and work is being done to use it for Ethereum Solidity.

" Krakatoa and Jessie are two front-ends of the Why platform for deductive program verification. Krakatoa deals with Java programs annotated in a variant of the The Java Modeling Language. Jessie deals with C programs annotated in the ANSI/ISO C Specification Language (ACSL). "

"Microsoft's Dafny -- an imperative, non-functional (I think) language -- is a very popular choice in verification challenges (alongside tools like KIV and Why3). " -- [5]

Links:

Jessie

JESSIE is an intermediate language in the Why Platform. Annotated C and Java programs are compiled to JESSE as the first step of the Why system.

" C is a complex language, both in terms of syntax and semantics. It makes it challenging to describe and understand any analysis directly performed at the level of C source programs. Instead, we translate C programs into a simpler language, JESSIE ... JESSIE is a simple typed imperative language, with precisely defined semantics, which allows for an easy exposure and understanding of analyses. ... JESSIE is a direct successor of the in termediate languages used in Caduceus and Krakatoa, which aim at deductive verification of C and JAVA languages, in the framework of the Why Platform [69]. ... JESSIE was developed in parallel with ACSL [14], the ANSI C Specification Language, with which it shares most of its constructs. The memory model constructs are essentially specific to the analysis of C programs, partly inspired by the annotation language of Caduceus, while most other logic constructs can already be found in JML. ... JESSIE follows most intermediate languages for C analysis in targeting all of C, at the exclusion of embedded assembly code, i.e. , the ability to em bed instructions in an assembly language inside a C program. Like Newspeak and Clight, JESSIE relies on a byte-level block memory model. Like CIL and Clight, JESSIE remains as much as possible at the level of types, while allowing byte-level operations. As presented in Figure 1.4, JESSIE is a target of translation from CIL, from which it inherits a collection of implementation-defined decisions w.r.t. the target architecture and the real C compiler. E.g. , we consider by default that a byte, the lowest addressable entity, is 8 bits, which is the case in almost all existing architectures.

JESSIE is original in at least two directions. First, the J ESSIE memory model and data types are notably simple while staying at the level of structured types, which allows more easily to generate annotations. Secondly, J ESSIE combines operational and logical features. Its operational part consists in statements which describe the flow of control and instructions which perform operations on data, including memory updates. Its logical part is described through first-order logic propositions , which annotate statements and functions. Statements, instructions and propositions are all built upon side-effect free terms . Contrary to Simpl, JESSIE does not offer an embedding inside a theorem prover, which means that translations to and from JESSIE must be trusted. ... JESSIE types are much simpler than C types, which accounts a lot for the gain in analyzing a JESSIE program instead of a C program. A type is either a base type or a user-defined type. Base types are mathematical integers, booleans and real numbers plus unit (the void type). User-definedtypes duplicate in J ESSIE the bound restrictions of operational C types: integer ranges have a minimal and maximal value, while pointers may be limited in the range of indices they address, with an optional minimal and maximal index. An integer range type in JESSIE is very similar to an integer type in A DA (see also [74] for a similar proposition for C, that is part of the Secure Coding initiative by CERT). The complex interplay of pointers, arrays and structures in C is replaced in JESSIE with much simpler design decisions. The only type of aggregate in JESSIE is the array of struc- tures, which can only be accessed by pointer, and the only type of pointer in JESSIE is the pointer to an array of structures. " -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 43

Links:

Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009

F*

https://www.fstar-lang.org/

Tutorials:

https://www.fstar-lang.org/tutorial/

" ... a language that works both as a proof assistant as well as a general-purpose, verification-oriented, effectful programming language. In support of these complementary roles, F⋆ is a dependently typed, higher-order, call-by-value language with primitive effects including state, exceptions, divergence and IO. Although primitive, programmers choose the granularity at which to specify effects by equipping each effect with a monadic , predicate transformer semantics. ... F⋆ is programmed (but not verified) in F⋆ , and bootstraps in both OCaml and F#. "

There is work on using F* to verify Ethereum (EVM) programs Formal Verification of Smart Contracts.

Links:

F. Swamy, N.; Hritcu, C.; Keller, C.; Rastogi, A.; Delignat-Lavaud, A.; Forest, S.; Bhargavan, K.; Fournet, C.; Strub, P.; Kohlweiss, M.; Zinzindohoue, J. K.; and Béguelin, S. Z. Dependent types and multi-monadic effects in F*. In Bodík, R.; and Majumdar, R., editor(s), Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, pages 256--270, 2016.
Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cedric Fournet, Anitha Gollamudi, Georges Gonthier, Nadim Kobeissi, Aseem Rastogi, Thomas Sibut-Pinote, Nikhil Swamy, Santiago Zanella-Beguelin. Formal Verification of Smart Contracts
- discussion: https://www.reddit.com/r/ethereum/comments/4yzmr2/formal_verification_of_smart_contracts_a_paper/

TS*

"A gradually-typed core of JavaScript?"

Links:

N. Swamy, C. Fournet, A. Rastogi, K. Bhargavan, J. Chen, P. Strub, and G. M. Bierman. Gradual typing embedded securely in javascript. In POPL ’14, pages 425–438. ACM,2014

Dafny

"Microsoft's Dafny -- an imperative, non-functional (I think) language -- is a very popular choice in verification challenges (alongside tools like KIV and Why3). " -- [6]

Links:

SPARK Ada

https://en.wikipedia.org/wiki/SPARK_(programming_language)

"a heavy duty verifiable language, with a good track record over many years in the industry." -- [7]

"...a formally defined computer programming language based on the Ada programming language, intended for the development of high integrity software...SPARK aims to exploit the strengths of Ada while trying to eliminate all its potential ambiguities and insecurities. SPARK programs are by design meant to be unambiguous, and their behavior is required to be unaffected by the choice of Ada compiler. These goals are achieved partly by omitting some of Ada's more problematic features (such as unrestricted parallel tasking) and partly by introducing contracts which encode the application designer's intentions and requirements for certain components of a program." [8]

P

https://github.com/p-org/P

A language for asynchronous event-driven programming which can be verified using model-checking.

K

http://kframework.org/

"The goal of K is to separate specification of analysis tools from specification of particular programming languages or other models, making it easier to specify both the analysis tools and the programming languages"

example use of K to verify the Ethereum Virtual Machine (EVM):

example use of K to verify and design Iele (also discussion of creating an interpreter framework that defines an efficient interpreter via K):

https://iohk.io/blog/iele-a-new-virtual-machine-for-the-blockchain

---

https://www.figma.com/blog/how-we-built-the-figma-plugin-system/

---

https://github.com/hwayne/lets-prove-leftpad (introduction: Let's Prove Leftpad)

---

proj-plbook-plChSecureLangs

Security languages

E

CHERI

SAFE

PASCAL-F verifier

Languages for verification

BoogiePL

Why

Jessie

F*

TS*

Dafny

SPARK Ada

P

K