proj-plbook-plChRust

Table of Contents for Programming Languages: a survey

Rust

"Rust combines low-level control over performance with high-level convenience and safety guarantees. Better yet, it achieves these goals without requiring a garbage collector or runtime, making it possible to use Rust libraries as a “drop-in replacement” for C....What makes Rust different from other languages is its type system, which represents a refinement and codification of “best practices” that have been hammered out by generations of C and C++ programmers." -- [1]

"Rust is a language that allows you to build high level abstractions, but without giving up low-level control – that is, control of how data is represented in memory, control of which threading model you want to use etc. Rust is a language that can usually detect, during compilation, the worst parallelism and memory management errors (such as accessing data on different threads without synchronization, or using data after they have been deallocated), but gives you a hatch escape in the case you really know what you’re doing. Rust is a language that, because it has no runtime, can be used to integrate with any runtime; you can write a native extension in Rust that is called by a program node.js, or by a python program, or by a program in ruby, lua etc. and, however, you can script a program in Rust using these languages." -- Elias Gabriel Amaral da Silva

"...what Rust represents:

Pros

Tutorials and books

Best practices and tips, libraries, tools

fn double_or_multiply(x: i32, y: Option<i32>, double: bool) -> Result<i32> { if double { if y.is_some() { return Err("Y should not be set"); } x * 2 } else { if y.is_none() { return Err("Y should be set"); } x * y.unwrap() } }

Yes, I know its a completely contrived example, but I’m sure you’re familiar with that kind of pattern in code. The issue is that this is using the shallow aspects of Rust’s type system – you end up paying for all of Rust but only reaping the benefits of 10% of it. Compare that to what you could do by leveraging the type system fully:

enum OpKind? { Double(x), Multiply(x, y), }

fn double_or_multiply(input: OpKind?) -> i32 { match input { Double(x) => x * 2, Multiply(x, y) => x * y, } } " -- [5]

Old/probably out of date but otherwise look good:

Features

concurrency features

" Rust's ownership rules are as follows:

    Each value in Rust has a variable that's called its owner.
    There can only be one owner at a time.
    When the owner goes out of scope, the value will be dropped / released.

Then there are rules about references (think intelligent pointers) to owned values:

    At any given time, you can have either one mutable reference or any number of immutable references.
    References must always be valid.

Put together, these rules say:

    There is only a single canonical owner of any given value at any given time. The owner automatically releases/frees the value when it is no longer needed (just like a garbage collected language does when the reference count goes to 0).
    If there are references to an owned value, that reference must be valid (the owned value hasn't been dropped/released) and you can only have either multiple readers or a single writer (not e.g. a reader and a writer).

The implications of these rules on the behavior of Rust code are significant:

    Use after free isn't something you have to worry about because references can't point to dropped/released values.
    Buffer underruns, overflows, and other illegal memory access can't exist because references must be valid and point to an owned value / memory range.
    Memory level data races are prevented because the single writer or multiple readers rule prevents concurrent reading and writing. (An assertion here is any guards - like locks and mutexes - have appropriate barriers/fences in place to ensure correct behavior in multi-threaded contexts. The ones in the standard library should.)

" -- [9]

great explanation of ADTs and their use for "Encoding and Enforcing Invariants in the Type System", with an example, in https://gregoryszorc.com/blog/2021/04/13/rust-is-for-professionals/ section "Encoding and Enforcing Invariants in the Type System".

" some of my favorite (other) parts about the language, in no particular order.

Native types in Rust are intelligently named: i32, u32, f32, f64, and so on. These are, indisputably, the correct names for native types.

Destructuring assignment: awesome. All languages which don’t have it should adopt it. Simple example:

let ((a, b), c) = ((1, 2), 3); a, b, and c are now bound

let (d, (e, f)) = ((4, 5), 6) compile-time error: mismatched types

More complex example:

What is a rectangle? struct Rectangle { origin: (u32, u32), size: (u32, u32) }

Create an instance let r1 = Rect { origin: (1, 2), size: (6, 5) };

Now use destructuring assignment let Rect { origin: (x, y), size: (width, height) } = r1; x, y, width, and height are now bound

The match keyword: also awesome, basically a switch with destructuring assignment.

Like if, match is an expression, not a statement, so it can be used as an rvalue. But unlike if, match doesn’t suffer from the phantom-else problem. The compiler uses type-checking to guarantee that the match will match something — or speaking more precisely, the compiler will complain and error out if a no-match situation is possible. (You can always use a bare underscore _ as a catchall expression.)

The match keyword is an excellent language feature, but there are a couple of shortcomings that prevent my full enjoyment of it. The first is that the only way to express equality in the destructured assignments is to use guards. That is, you can’t do this:

match (1, 2, 3) { (a, a, a) => “equal!”, _ => “not equal!”, }

Instead, you have to do this:

match (1, 2, 3) { (a, b, c) if a == b && b == c => “equal!”, _ => “not equal!”, }

Erlang allows the former pattern, which makes for much more succinct code than requiring separate assignments for things that end up being the same anyway. It would be handy if Rust offered a similar facility.

My second qualm with Mr. Match Keyword is the way the compiler determines completeness. I said before “the compiler will complain and error out if a no-match situation is possible,” but it would be better if that statement read if and only if. Rust uses something called Algebraic Data Types to analyze the match patterns, which sounds fancy and I only sort-of understand it. But in its analysis, the compiler only looks at types and discrete enumerations; it cannot, for example, tell whether every possible integer value has been considered. This construction, for instance, results in a compiler error:

match 100 { y if y > 0 => “positive”, y if y == 0 => “zero”, y if y < 0 => “negative”, };

The pattern list looks pretty exhaustive to me, but Rust wouldn’t know it. I’m that sure someone who is versed in type theory will send me an email explain how what I want is impossible unless P=NP, or something like that, but all I’m saying is, it’d be a nice feature to have. Are “Algebraic Data Values” a thing? They should be.

It’s a small touch, but Rust lets you nest function definitions, like so:

fn a_function() { fn b_function() { fn c_function() { } c_function(); works } b_function(); works c_function(); error: unresolved name `c_function` }

Neat, huh? With other languages, I’m never quite sure where to put helper functions. ... Mutability rules: also great. Variables are immutable by default, and mutable with the mut keyword. It took me a little while to come to grips with the mutable reference operator &mut, but &mut and I now have a kind of respectful understanding, I think. Data, by the way, inherits the mutability of its enclosing structure. This is in contrast to C, where I feel like I have to write const in about 8 different places just to be double-extra sure, only to have someone else’s cast operator make a mockery of all my precautions.

Functions in Rust are dispatched statically, if the actual data type is known as compile-time, or dynamically, if only the interface is known. (Interfaces in Rust are called “traits”.) As an added bonus, there’s a feature called “type erasure” so you can force a dynamic dispatch to prevent compiling lots of pointlessly specialized functions. This is a good compromise between flexibility and performance, while remaining more or less transparent to the typical user.

Is resource acquisition the same thing as initialization? I’m not sure, but C++ programmers will appreciate Rust’s capacity for RAII-style programming, the happy place where all your memory is freed and all your file descriptors are closed in the course of normal object deallocation. You don’t need to explicitly close or free most things in Rust, even in a deferred manner as in Go, because the Rust compiler figures it out in advance. The RAII pattern works well here because (like C++) Rust doesn’t have a garbage collector, so you won’t have open file descriptors floating around all week waiting for Garbage Pickup Day. " [10]

rust loops are expressions; break can return a value

rust blocks are expressions

rust has loop, while, for

" Rust’s price for improved control is the curse of choice:

struct Foo { bar: Bar } struct Foo<'a> { bar: &'a Bar } struct Foo<'a> { bar: &'a mut Bar } struct Foo { bar: Box<Bar> } struct Foo { bar: Rc<Bar> } struct Foo { bar: Arc<Bar> } " -- https://matklad.github.io/2020/09/20/why-not-rust.html

Opinions

stateful async would be writing IO. You've passed in a buffer, the length to copy from the buffer. In the continuation, you'd need to know which original call you were working with so that you can correlate it with those parameters you passed through.

    var state = socket.read(buffer);
    while (!state.poll()) {}
    state.bytesRead...

Stateless async is accepting a connection. In 95% of servers, you just care that a connection was accepted; you don't have any state that persists across the continuation:

    while (!listeningSocket.poll()) {}
    var socket = listeningSocket.accept();

Stateless async skirts around many of the issues that Rust async can have (because Pin etc. has to happen because of state). ... when you poll a future, you pass in a context. The future derives a "waker" object from this context which it can store, and use to later trigger itself to be re-polled.

By using a context with a custom "waker" implementation, you can learn which future specifically needs to be re-polled.

Normally only the executor would provide the waker implementation, so you only learn which top-level future (task) needs to be re-polled, but not what specific future within that task is ready to proceed. However, some future combinators also use a custom waker so they can be more precise about which specific future within the task should be re-polled. " [48] and [49] "when implementing poll based IO with rust async, typically you have code like “select(); waker.wake()” on a worker thread. Select blocks. Waking tells the executor to poll the related future again, from the top of its tree. The waker implementation may indeed cause an executor thread to stop waiting, it depends on the implementation. It could also be the case that the executor is already awake and the future is simply added to a synchronised queue. Etc."[50]

Example of: " For example, this code:

fn bar() -> i32 { 5 }

fn foo() -> &'static i32 { &bar() }

gives this error:

error[E0716]: temporary value dropped while borrowed --> src/lib.rs:6:6

67
&bar()
^^^^^ creates a temporary which is freed while still in use
}
- temporary value is freed at the end of this statement
  = note: borrowed value must be valid for the static lifetime...

bar() produces a value, and so &bar() would produce a reference to a value on foo()‘s stack. Returning it would be a dangling pointer. In a system without automatic memory management, this would cause a dangling pointer. " -- [72]

    fn x() -> Vec<i32> {
        let mut x = (0..3).collect();
        x.sort(); // Calling any method of Vec
        x // Cannot infer that `x` is `Vec<i32>` because a method was called
    }

Type system also helps, because e.g. thread-safety is described in types, rather than documentation prose. Rust also made some conscious design decisions to favor libraries, e.g. borrow checking is based only on interfaces to avoid exposing implementation details (this wasn’t an easy decision, because it makes getters/setters overly restrictive in Rust). " [92]

Opinionated Comparisons

RC) to also adopt some kind of early reclamation, similarly to Swift's ongoing approach, and we will reach a good enough situation and that will be it. " [110]
Eventually I get over it but I find TypeScript? so much nicer for describing the types of my programs. " [130]* "Oh yes. I wish TypeScript? had more Rust, but I also wish Rust had row-level polymorphism." [131] "Maybe someday https://github.com/tc39/proposal-pattern-matching will happen. That will be a great day." [132]
Baz;

Dev blogs

Best practices

Retrospectives

Rust Gotchas

enum MyValue? { Digit(i32) }

fn main() { let x = MyValue::Digit(10); let y = x; let z = x; }

The reason is that z might (later) mess with x, leaving y in an invalid state (a consequence of Rust’s strict memory checking — more on that later). Fair enough. But then changing the top of the file to:

  1. [derive(Copy, Clone)] enum MyValue? { Digit(i32) }

makes the program compile. Now, instead of binding y to the value represented by x, the assignment operator copies the value of x to y. So z can have no possible effect on y, and the memory-safety gods are happy.

To me it seems a little strange that a pragma on the enum affects its assignment semantics. It makes it difficult to reason about a program without first reading all the definitions. The binding-copy duality, by the way, is another artifact of Rust’s mixed-paradigm heritage. Whereas functional languages tend to use bindings to map variable names to their values in the invisible spirit world of no-copy immutable data, imperative languages take names at face value, and assignment is always a copy (copying a pointer, perhaps, but still a copy). By attempting to straddle both worlds, Rust rather inelegantly overloads the assignment operator to mean either binding or copying. " [175] this is partially addressed in [176]

Suggestions and suggested variants

Rust Features

Ownership types

Links:

Older discussions (Rust has probably changed a lot since these):

Lifetimes (related to ownership types)

Procedural Macros

Rust custom derive and procedural macros: https://doc.rust-lang.org/book/procedural-macros.html

Constant expression

) and let statements within the same constant.
        assignment expressions
        compound assignment expressions
        expression statements
    Field expressions.
    Index expressions, array indexing or slice with a usize.
    Range expressions.
    Closure expressions which don't capture variables from the environment.
    Built-in negation, arithmetic, logical, comparison or lazy boolean operators used on integer and floating point types, bool, and char.
    Shared borrows, except if applied to a type with interior mutability.
    The dereference operator.
    Grouped expressions.
    Cast expressions, except pointer to address and function pointer to address casts.
    Calls of const functions and const methods."
casting an array to a slice
operators

Rust Internals and implementations

Core data structures: todo

Number representations

Integers

Floating points todo

array representation

variable-length lists: todo

multidimensional arrays: todo

limits on sizes of the above

string representation

Representation of structures with fields

ABI

Rust compiler implementation

MIR

https://rustc-dev-guide.rust-lang.org/mir/index.html

Upcoming new type checker Polonius

https://github.com/rust-lang/polonius http://smallcultfollowing.com/babysteps/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/ https://rust-lang.github.io/polonius/

Academic research inspiring or about Rust

https://doc.rust-lang.org/1.2.0/book/academic-research.html

Misc Rust implementation

Rust tests

Rust variants

The core language in the RustBelt? paper:

Rust Links