Bayle Shanks's website: proj-oot-old-150618-ootDesignChoices

High-level design criteria

In roughly descending order of importance to Oot.

Turing completeness

A given, but i'm just being comprehensive here.

Unambiguous

Computer programming languages are defined as languages intended for describing computation, but another thing that distinguishes them from human language is their unambiguity. So, programming languages might be said to be languages for defining computation, not just for describing them.

However, i have a feeling that human language gets much of its conciseness (e.g. for many tasks, you could get a human to do the same task as a computer with less characters in the human instructions than in the computer program) by tolerating ambiguity. If someone could figure out a way to make a a programming language that worked like that, that would be neat. So unambiguity isn't actually an end-goal for me. But since i can't figure out how to make a programming language that works like this, for now i'll stick with unambiguous languages.

Succinctness

Paul Graham justifies this well at http://www.paulgraham.com/power.html (although i disagree that 'succinctness is power'; i think language power is more what i call 'extensibility', below). Succinctness is not just efficiency of writing but seems to (inversely) correspond to how much mental resources an expert must expend to keep part of a program in one's head.

An ideally succinct language would allow you to write a large project using much less lines of code than it would take in another language.

Low viscosity

Viscosity mean how hard is it to change a program. (I stole this term from http://en.wikipedia.org/wiki/Cognitive_dimensions_of_notations . My usage of this term also incorporates what they call "Premature commitment")

Code reuse

An ideally reusable language would never require you to rewrite something very similar to something that has already been written; you'd always be able to adapt the existing code to your situation and reuse it.

In practice, code reuse serves similar goals as succintness.

Readability

The tendency of source code written in the language to be easy to read by someone other than the writer. This is an interesting property because it relates to a community of people rather than to individuals; it spotlights that an important function of a computer programming language is communication between two humans, in addition to its function as communication between a human and a computer. Of course, readability would also be important even if you were the only person on Earth, because after a lot of time passes, it can be hard to read your own code, too (you could imagine a community composed of your past and future selves).

Succinctness is sometimes in conflict with readability, because sometimes a language achieves succinctness by having code that takes a lot of effort to 'decode' into primitives that are intuitive for the human mind. But sometimes it is not, because if the language is succinct, there is less to read.

A case study in readability is to contrast Perl (unreadable) and Python (readable). These languages share many similarities and they are prototypical 'scripting languages' which played similar roles.

See http://www.python.org/dev/peps/pep-0020/ for some principals of Python's design, many of which contribute to readability. Here are the ones i think contribute most to readability, in descending order, with my interpretation/comments:

"Readability counts." Explicitly prioritize readability when designing the language.
"There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch." If the set of common idioms is small, then readers are less likely to encounter code that does things in a clever way that they have to think about.
"Explicit is better than implicit." This one is a direct tradeoff between succintness and readability. Sometimes it's good to force code to say stuff that could be logically inferred, just so the reader doesn't have to think as much.
"Simple is better than complex. Complex is better than complicated." It's more readable to have constructs with fewer parts, unless this can only be achieved by having complicated semantics (essentially inferring many parts from few), in which case it's better just to have many parts explicitly.
"Flat is better than nested." It takes more mental effort to hold a nested structure of code in your head.
"Sparse is better than dense." Lots of whitespace makes code more readable.
"Special cases aren't special enough to break the rules. Although practicality beats purity.". Irregularity implies less readability; but don't take regularity TOO far.

An ideally readable language would make it so that an intermediate-level programmer could easily understand any code written in the language.

Code reuse can increase readability.

For Oot, i agree most with "one obvious way to do it" (oowtdi), "Simple is better than complex" (few parts), and "Special cases aren't special enough to break the rules" (regularity). I think that in many cases, succintness trumps "Explicit is better than implicit" (e.g. i hate typing 'self' all the time in Python, and i dont care for the colons in Python either; i think these sorts of things help learnability a lot but dont help the expert reader much), "Flat is better than nested" and "Sparse is better than dense" (i agree that is is more effort to read dense, nested code, but i think it's worth it because then you can fit more code on the page, which makes the program as a whole easier to grok even though it makes it harder to read each piece of it).

Another part of readability is how easy it is to document things. If it is easier, then people will tend to document more. The more extensible the language is, the more that documentation is needed, because you can't take the semantics of anything for granted (as pointed out by http://pointersgonewild.wordpress.com/2012/04/11/why-lisp-never-took-off/ )

A Python convention is to explicitly import each symbol used which is defined in a library. This allows a human reader to know where a symbol came from by reading only the source file in which the symbol is used. However, in theory an automated tool could tell the human reader where a symbol came from even without this. My notion of readability assumes that the reader has access to all of the automated tools that come with the language.

Debuggability

(todo)

Debuggability is often related to readability: how easily can you understand what the compiler or interpreter is trying to tell you.

Safety

Or, lack of error-prone-ness. The metric is not how easy is it to write code, but how easy is it to write error-free code.

An ideally safe language would almost never have any bugs except conceptual bugs -- when you have a bug, it would be easy to explain what the bug was to anyone else working on a similar task in a different language, without explaining anything about your language.

Examples of safety are:

hygienic macros (to prevent problems like those described with C macros in http://lists.warhead.org.uk/pipermail/iwe/2005-July/000130.html )
strong static typing
garbage-collection

Security and sandboxability

The language should allow one piece of code to run another piece of code with limited privilages. Perhaps it should also support trusted computing to prevent this ( https://bitcointalk.org/index.php?topic=86278.0 ) but i'm worried about this ( http://www.gnu.org/philosophy/can-you-trust.html ).

I don't know much about this so i'll probably get it wrong.

Extensibility

Any Turing-complete language can simulate any other one, so what could extensibility mean? I think it means not just the ability to make one language act like another language, but to get it to do so by means of source code that is similar in structure to what you'd give the other language; and also without a huge amount of code (e.g. without an interpreter). That is, how succinctly can you coax this language to take code which is similar in structure to the code you would give some other language, and have this language do with that the same thing that the other language would do? For instance, in Prolog you can say:

    sibling(X,Y) :-
            parentS(F,M,X),
            parentS(F,M,Y),
            X \= Y.

    mother(m,x).
    mother(m,y).
    father(f,b).
    father(f,c).
    mother(X,Y) :- parentS(_,X,Y).
    father(X,Y) :- parentS(X,_,Y).

    sibling(x,y).

How much code does it take to be able to give code structurally similar to that to Python and to get results like that?

Extensibility is usually used as a subjective concept. Out of all of the mathematically possible languages (computable total functions from source code to behavior), most of them are senseless confusing things that map short strings to what would be arbitrarily long detailed programs in, say, Python. Really, we only care about extensibility relative to some distribution of possible languages that people tend to want to mimic, a distribution that occupies a tiny subset of all possible languages. But since new languages are being invented all the time, we don't know precisely how to characterize this 'useful' subset. Extensibility is usually characterized by finding other known languages and trying to express interesting constructs from that language in this one, to see if it can be done using a similar form and relatively succinctly. But the acid test is how well this language can express constructs that no one had yet thought of at the time of this language's creation.

An ideally extensible language would (assuming it is efficient, had good libraries, etc) be the perfect language for writing programming languages. In fact, it would be so perfect that all other programming languages would, from that time forward, be developed in this language.

Extensibility is in conflict with readability. Imagine a language which was capable of easily mimicing every other existing programming language (perhaps you can have a header 'perl' for a section of code that is like perl, then 'scheme' for a section like scheme, etc). Different authors would write code using different combinations of mimiced languages, and the reader would have to know how to read all of them, which is equivalent to being good at reading every existing programming language; a formidable task.

Extensibility is in conflict with safety. Extensible languages allow you to easily whip up abstract formal systems which are abstracted very far from any typical concrete domain, systems with a lot of 'leverage' in the sense that a small change in the source code can cause a large change in the structure of the semantics. This is dangerous because (a) intuition about systems and experience about the hazards in systems derive from concrete domains, but since you can easily create systems abstracted far from typical concrete domains, you can easily venture into untested and confusing waters, and (b) the leverage combined with the lack of intuition makes it easier to have 'false cognates', operations on the source code which you think will do one thing but actually do something else.

In real life, parts of code which do metaprogramming are often described as 'black magic' due to their irreadability (it's hard to figure out how they work) and lack of safety (it's hard to change them without breaking them).

In order to fight these tendencies, Oot provides many different types of metaprogramming facilities, on a ladder from less powerful to more powerful. The idea is that each author should use the least powerful metaprogramming facility that they can to accomplish their goal; this way, one incurs the minimum amount of damage to readability and safety; the magic is only as black as it needs to be.

Language size

An ideally small language would be able to be defined in just one page of plain English.

Small size is very important to Oot, although i don't know why.

Another usage of the term 'language size' is how many of the standard constructs and library functions and idioms you need to understand to be comfortable reading typical real-world code written in the language. I don't care about that as much. In that sense, Oot will probably end up being larger than i'd like, but not huge.

Orthogonality

In an ideally orthogonal language, none of its constructs would be able to be implemented in terms of the others.

If 'language' is understood in terms of the language core, this is very important to Oot, although, like size, i don't know why.

If 'language' is understood in terms of the set of typically used standard constructs and library functions, this is important to Oot, but only for its oowtdi value. So, the real values for Oot are code reuse and readability, not orthogonality for its own sake, at least when talking about all standard parts of the language, not just the core.

Regularity

An ideally regular language would be defined by a small number of simple rules with no exceptions.

I don't know why this is important (aside from learnability) but i feel intuitively that it is. I intend for Oot to be regular.

Interoperability

Interoperability refers to how easy it is to compose a system out of some parts written in this language and some parts written in other languages.

An ideally interoperable language would be able to call and be called from any other language, transparently converting data to and from the other languages' native data structures.

I don't know anything about how to create an interoperable language. I guess much of it is having 'a good FFI', but i'm not sure what characteristics a good FFI has, or what other features are useful outside of a good FFI. I'm hoping to look at Clojure for some pointers. Any advice? Email me please:

http://en.wikipedia.org/wiki/Cognitive_dimensions_of_notations

Gravity

Communal gravity is the tendency of a language to resist splintering. The community of an ideally grave language would never splinter into dialects.

Extensibility is in conflict with communal gravity. It has been said that Lisp splinters easily because it is so easy to create your own Lisp dialect by extending an existing Lisp with macros (see http://www.winestockwebdesign.com/Essays/Lisp_Curse.html ). For Oot, extensibility is a high priority, so gravity will have to suffer. We hope that the ladder of metaprogramming facilities will fight the splintering tendency, as well as our effort to consider merging popular improvements into Oot core (although we plan to reject more than we accept).

Another Oot increases communal gravity is by having a single canonical implementation (see below).

High-level design criteria that Oot values somewhat less than other languages

Learnability

An ideally learnable language would be learnable by a child in one day.

Learnability is not a major goal for Oot. I hope that the other goals of Oot (e.g. small language size, regularity) also cause it to be learnable, but it's not a major goal in and of itself. For example, in Python, the colon was added because it makes the language easier to learn ( http://python-history.blogspot.com/2009/02/early-language-design-and-development.html ); i wouldn't add an extra required delimiter to Oot just to make it easier at the beginning. Another example, some languages also adopt so-called Algol-style syntax merely because programmers are already used to it (apparently even though Javascript was based on Scheme and Self, "Netscape management also decided that JavaScript’s? syntax had to look like Java’s. That ruled out adopting existing scripting languages such as Perl, Python, TCL and Scheme" -- http://www.2ality.com/2011/03/javascript-how-it-all-began.html); i try to stick to what i perceive as popular syntax when there's no reason not to, but i don't mind throwing that away if i have a better idea.

Compilation speed

Compilation speed is in conflict with succinctness, extensibility, and safety. Compilation speed is not a major goal for Oot, although it's not totally unimportant either -- we don't want people having to go out to lunch every time they want to change a small project.

Execution speed

Execution speed is in conflict with succinctness and safety. Execution speed is not a major goal for Oot. If we create a language that is just incredibly awesome but way too slow, and we have to remove some power in exchange for speed, that's fine, but we'll cross that bridge if we come to it.

Syntax

More specific design criteria for syntax

Editorial independence

I've customized my favorite text editor a bit and so i like to edit everything with that, rather than run a specific IDE for each language. So, i want Oot to work well with almost any text editor, even Notepad. That rules out some design choices; e.g. w/r/t Lisp, when people complain that it's hard to count parentheses, Lispers say, 'your editor should do that'; w/r/t Python, when people complain that significant indentation is a pain to maintain, Pythonistas say, 'your editor should do that'. That excuse is not available to us.

Homeoiconicity

Wikipedia defines homeoiconicity as "a property of some programming languages, in which the primary representation of programs is also a data structure in a primitive type of the language itself". Homeoiconic languages are often described as 'syntaxless'.

Unfortunately this concept is imprecise. Perhaps you suppose that the primary representation is the source code; then should any language with strings be considered homeoiconic? Perhaps you suppose that the primary language is the AST; then since Python's AST can be represented as a list or a dict, should Python be considered homeoiconic? Perhaps you suppose that homeoiconicity implies that the structure of the source code should mimic the structure of the AST ("the internal and external representations are essentially the same" to use Alan Kay's terminology), e.g. parsing should involve ONLY looking for matched grouping constructs such as parens; but even most Lisps have a few exceptions to that such comment-to-end-of-line and single-quote expansion, e.g. 'x --> (quote x).

The benefit of homeoiconicity is purportedly that it makes macros easier. The cost is that your choice of syntax is very restricted (so much so that homeoiconic languages are often described as 'syntaxless').

I question that purported benefit; shouldn't any language which provides language-level support for easily manipulating the AST be easy for macros? However Steve Yegge says otherwise: "Unfortunately, even Ruby and Python (which "feel" simpler, syntactically) both also have very complicated grammars, making it nontrivial to write code that processes them, even with parsers that hand you the AST directly." -- http://steve-yegge.blogspot.com/2007/02/next-big-language.html . I haven't actually tried to do things with the Ruby or Python ASTs so i'll have to take his word for it.

My feeling is that homeoiconicity is a very good thing, but not an absolute requirement. I think a language which is 'almost homeoiconic' is okay.

Oot's deviations from homeoiconicity include (todo: make this a comprehensive list):

grouping aids processed at parsing time:
- tight infixification
- loose infixification
- auto(semi)colons at some EOLs
- autoclosing parens at the end of (semi)colon-separated regions and blocks
optional left and right associativity (with . and

)

= (the assignment operator)

Read left-to-right (standard orientation)

That is to say, in Oot if you want to apply f to y, add 1, and assign the result to x, you write "x = 1++(f y)", not "1++(y f) = x". "1++(y f) = x" (the nonstandard way that we didn't adopt) would be clearer, because when you read it from left to right, that follows the order that data actually flows thru the expression.

However, programming text is usually left-aligned, which makes it easy to scan down the screen with your eye and look at the leftmost words; and hard to scan down looking at the rightmost words. A common use of scanning is to look for where some variable is defined, or when some side-effectful command is executed (e.g. 'print "hello"'). So, we want the 'x' in 'x = 3', and the 'print' in 'print "hello"' to be on the left. We'd like to keep things regular, so this implies that we should use the usual ordering.

Whitespace

I'm a fan of the concept of significant whitespace. I don't care what style my whitespace is, and i hate reformatting my code to meet some whitespace standard. I'm perfectly happy to let whitespace have some semantics if it makes code easier to read or quicker to type; i'll give up the freedom to decide if my curly brace should go at the end of this line or the beginning of the next. However, the typical whitespace semantics (Python-style significant indentation) doesn't copy-and-paste well (or maintain well) with dumb code editors. So significant indentation is out, but other kinds of significant whitespace is in.

Precedence

On one end of the scale is custom precedence (e.g. Haskell). On the other is no precedence (e.g. Lisp).

I don't like custom precedence. Custom precedence severely impacts readability because a reader can't even parse a line of some else's code without first looking up the precedence of all of the user-defined operators in that code.

For the sake of a small, regular language, Oot eschews the notion of a long list of keywords with varying precedence. The exception is the assignment operator.

Is Oot a Lisp?

I'm not going to be drawn into this argument. If i say yes, Lisp fans will say i am besmirching the name of Lisp with my unworthy language. If i say no, Lisp fans will say i am claiming to have invented something new when i really just made a poor version of Lisp. Different folks have different definitions of Lisp. So, if you tell me what definition of Lisp you like, i'll tell you if Oot meets that definition.

Unicode

Don't know enough to judge. My guess is that to properly support unicode we need:

the default string type to support unicode
the source files to be able to be unicode

Semantics

Single data structure

One way to achieve code reuse is through having a few standard data structure interfaces. Since code often cannot easily be generalized to apply to a data structure with a different interface, the fewer of these there are, the more reuse will be possible. Python and Clojure are particularly good about this. This is an application of Python's adage, "There should be one-- and preferably only one --obvious way to do it."

Python still has some trouble; I have often found myself having to cast back and forth between Python arrays (which work in standard list comprehensions) and Python (numpy) arrays. Haskell is close to the mark but also has some trouble; you cannot just take any library that was written to use Haskell strings (linked lists of characters) and feed it Haskell ByteStrings? (arrays of characters) (annoyingly, if everything in Haskell uniformally used typeclasses, this wouldn't be a problem -- but Haskell syntax makes typeclasses more verbose than vanilla data types, and in addition, many programs won't compile if you don't use a vanilla type anywhere, because the compiler doesn't know how to choose which type in the typeclass you want).

Like Lisp, Oot excels here with only one primary data structure. Unlike Lisp and Haskell, the data structure is a network, not just a list.

Macros

Oot uses macros to help with extensibility and succinctness.

Optional static typing

Static typing is great for writing code in IDEs and for safety, but it tends to impede viscocity. Also, if the typing system is simple, it impedes extensibility, but if it is more expressive, it impedes readability. Type inference works well with a simple type system but is too unreadable (during debugging) with an expressive one.

There is a sense that the perfect type system would be simple enough to read and to debug inferred types, yet expressive enough not to impede the programmer too much; however to my knowledge such a system has not yet been discovered. As far as i can tell, the state of the art is something like Haskell's type system, which is sufficiently expressive for most purposes but way too hard to understand. And as far as i can tell, type system design is a popular research problem which occupies many of the best minds of computer science. So it's a hard problem, and until more progress has been made, typing in Oot will be optional, so that no one has to use whatever broken type system we come up with if they don't want to.

But optional typing is there because it's very useful for IDEs (reducing viscosity) and for safety.

Laziness

Oot is lazy because it helps with succinctness by allowing infinite data structures and it helps with code reuse by allowing the definition of data structures to be separated from the tactics used to realize parts of them. I believe it helps with extensibility too, although i'm not sure how.

The main downside of laziness is that it is hard to learn to use it (and to debug it) if you are used to non-lazy languages. Another downside is that it is hard to debug memory usage.

Immutability vs. aliasing

Oot's core data structure is immutable because this helps with safety. However, there are mutable constructs also, and even references.

Oot tries to make it easy to reason about mutability by providing constructs to control and limit mutability, by providing immutable core library functions, and by making immutability the default and mutability explicit rather than implicit.

A note on referential transparency, immutability, and aliasing. Referential transparency is said to make code easier to reason about. I opine that this is true more at the macro level than the micro. Specifically, i don't see any problem with destructive variable updates, e.g. code like "x = 3; x = x + 3", or with for loops with mutable iterator variables. These constructs make code concise and they easily be replaced by referentially transparent equivalents if necessary for analysis. I opine that where referential transparency helps more is in not having to worry about side effects across functional boundaries. One type of side effect is caused by reference types across functions, e.g. it's nice to not have to remember that if you change one of the items in array x over here that you'll also be changing one of the items of array y in some different part of the code, because x and y are pointers to the same array.

Therefore, Oot is not referentially transparent within functions (mutable variables are allowed). It permits the concept of a reference (a value that can have multiple aliases), but most library functions create values, not references, and references are distinguished syntactically. The idea is that if you don't think about it, you won't find yourself using references, but if you really need them, they're there.

Other side effects are also permitted, but are also distinguished.

Lexical scope is programming in the small

It's reasonable to expect the programmer to keep track of implicit side-effects that are caused within the lexical scope that e is currently working on. There's only so much code there to read.

What's difficult to keep track of is the side-effects of code that is outside of the current lexical scope.

In this sense, within lexical scope, the language should be optimized for programming-in-the-small, and should choose conciseness over safety. Across lexical scope, however, you want to optimize for programming-in-the-large, and should choose safety and clean interfaces.

For example, closures; if your function accesses a variable defined in an enclosing lexical scope, that's okay; it's not that hard for you to look thru the enclosing scope to find everywhere the variable is set in the enclosing scope. Similarly, if an inner function, a function that you define within the current lexical scope, changes a variable, that's okay; it's not that hard to look thru the enclosed scope to find everywhere the variable is set in the enclosed scope. But, if you use a variable that was defined by whoever called you from another lexical scope, and that wasn't passed in your formal parameters, that's a little more confusing. If a subroutine that you call mutates one of your local variables, that's very confusing. This is why global variables are more dangerous than lexical closures.

It's not so confusing when mutation happens within lexical scope. This is another way to justify why we permit local mutation of local variables by default, even though we prefer immutability when passing things between functions.

One detail is that you also have to watch the "marginal safety tax rates" as you go from lexical scope to non-; for example, at first it seems like a good idea to require that all function definitions have explicit type annotations for formal parameters, but this would cause programmers to put off splitting up a big function into smaller subfunctions (especially because it would increase viscosity; e.g. if you have to change one of those types later, now you have to change it in a bunch of places instead of just one, especially if one type changed causes the types of multiple formal parameters, which were previously untyped locals, to have to be changed, because then you can't just do a single string search-and-replace in a text editor).

Object system

Oot's object system serves multiple purposes. Using delegation and inheritance, it assists in code reuse. Using instance variables, it assists with encapsulation of mutable state. Using late-binding dynamic dispatch, it assists in extensibility.

Tracking of referential transparency

As part of the type information of identifiers, Oot tracks whether the identifier is 'pure', that is, whether it is defined in a referentially transparent manner. This helps with safety and readability.

Interceptors

Interceptors (called monads in other languages) are an extensibility mechanism.

Type system

More specific design criteria for type system

A type system has three purposes: safety, conciseness, IDE support

The type system provides safety by allowing a compiler to do a proof that certain kinds of errors are not present in the program.

The type system provides conciseness by allowing the programmer to write constructs that are locally ambiguous, with the ambiguity resolved by typing information that comes from other parts of the program (for example, ad-hoc polymorphism on type).

The type system provides IDE support by allowing autocomplete.

Type system also help a lot with optimization, but that's not a major goal of Oot, although hopefully we'll be able to add this later.

There should be as little mandatory interaction with the type system as possible

The programmer should be allowed to choose not to bother with the type system to as great an extent as possible. Practically, this means that anything which is there for solely for the purpose of safety is optional, leaving only that interaction with the type system required to resolve ambiguity. Furthermore, anything which is needed to resolve ambiguity but which is guaranteed to be known at runtime before the ambiguity must be resolved, may remain optional.

Also, the programmer should be able to choose the program 'dynamically', that is, to choose the type of objects at runtime, without bothering to prove at compile time that type errors will be avoided.

What level of safety is required is up to the person invoking the compiler or interpreter

That is, regardless of what the source code says, type errors which are only relevant to safety may be suppressed by the compiler or interpreter (todo: we should definitely be able to suppress when the error is because we cannot prove that we are safe; but should we be able to suppress errors when the compiler can prove that we ARE NOT safe? mb not..)

Kinds of errors that the 'basic' part of the type system should be capable of detecting

The type system should be at least capable of detecting:

when a programmer does the equivalent of invoking a method on an object whose name is statically known and yet whose presence on that object cannot be guaranteed. For instance, invoking a filehandle 'write' method on an integer.
the 'type' system should be able to track effects, too. It should be able to guarantee pointer uniqueness (non-aliasing of references), side-effect-less-ness, and immutability, among other things.

Types should be dealt with in pieces

E.g. if a function is side-effectful and it returns a list of ints, these can be seen as two separate pieces of information, rather than Haskell's way in which the type is IO [Int].

The mandatory portion of the type system should be simple

This implies that any portion of the type system needed to resolve ambiguity should be simple.

Carrying out computations in the type system should not be necessary in order for a reader to interpret 'syntactic' seeming things

For example, Haskell's use of induction in the type system to achieve variadic functions would be considered an abomination: http://stackoverflow.com/questions/3467279/how-to-create-a-polyvariadic-haskell-function (see [1] )

Note that we are concerned here with readability; we don't want to even offer the option for programmers to use the type system in this way.

It's easy for programmers to reason in the direction of the control flow, but harder otherwise

For example, ad-hoc polymorphism depending on the types of inputs to a function is easy to understand. In order to simulate the type inference in your head for this, you just have to simulate what is happening in the program anyways, which programmers are already good at.

Furthermore, ad-hoc polymorphism in this direction, if resolved at runtime, is the same as if there was a 'switch' statement in the code at this point that switches depending upon the type of the inputs. So programmers can reason about this type system behavior in terms of something else familiar.

On the other hand, ad-hoc polymorphism depending only on the types of return arguments is more difficult to understand. It proceeds 'backwards in time', and it cannot be emulated by non-type-inferencing runtime code.

This further explains why http://stackoverflow.com/questions/3467279/how-to-create-a-polyvariadic-haskell-function is difficult to understand (see [2] ).

It's less important for programmers to understand the parts of the type system that merely verify correctness (safety), but more important for them to understand the parts that alter semantics (conciseness)

The more special syntactic forms the type system has, the harder it is to learn

The optional parts of the type system should be as expressive and extensible as possible

It's okay for parts of the type system to be complicated, as long as those parts are only used by experts to essentialy write proofs about their code, and not necessary for other people to touch (or to understand in order to resolve ambiguity in someone else's code).

Interfaces

To help with code reuse and extensibility, Oot relies heavily on interfaces, like Python and Clojure and Haskell's typeclasses (btw there are important differences in how each of those languages their 'interface' analog).

Attribute types

Rather than speak of 'the' type of a value, the system is concerned with tracking multiple attributes of that value. A reference to a function may be unique, the function may be side-effectful, and it may return a list of ints. Programmers are able to write annotations that speaks of one of these facts in isolation without knowing about the others.

Ad-hoc polymorphism based on type

Restrictions on return-type-only ad-hoc polymorphism

Since ad-hoc polymorphism deals with ambiguity, not just safety, it is mandatory for readers to understand what the type system is doing in this case. Return-type-only ad-hoc polymorphism requires complex reasoning that cannot be mimiced by a simple runtime type check.

Therefore we choose to restrict (or possible even eliminate) this form of ambiguity resolution.

Promise/demand

Expanding the concept of 'assertion', Oot offers statements to promise that some conditional is True, and statements to demand that some conditional is True. The demand allows the programmer to express arbitrary invariants. The promise allows the programmer to tell the typechecker to assume that some invariant has been met, allowing the programmer to meet demands even when they are unable to formulate a proof for the typechecker that the demand has been met.

The invoker of the compiler or interpreter can tell Oot to treat promises and demands as runtime assertions, or to ignore them at runtime, or even to ignore them or subsets of them at compile time.

(also, todo i think we subsume existential types under this, but need to doublecheck)

Structural typing with net matching expressions

This allows the language of the type checker to be unified with a core Oot concept.

The typesystem is identical to the runtime constraint logic programming system

Types specify semantics, but not implementation

For example, the type system won't prevent you from using linked lists instead of byte arrays for your strings, provided both implementations support the same signature and claim to support the same semantics.

Note that a type may specify more than a signature, however; e.g. a function can have an opaque property that indicates that its semantics are such-and-such, which provides more informationa about its semantics than just its signature.

Type inferencing algorithm is modular/swappable

This allows the user to replace the type inferencing algorithm with a more powerful one, or one more suited to their domain, to assist is compiler-time verification of safety. note that the language of the typesystem, not just its behavior, can be modified.

modifying the language of the typesystem requires a high 'language difficulty level', because it makes things harder to read. Also, it could be abused (by providing an incorrect algorithm) to make ad-hoc polymorphism resolve in an unexpected way, causing the interpreter to behave differently from the compiler (e.g. to tell the compiler than something will be an Int when it will actually be a String, causing the compiler to choose polymorphism as if it would be an int, causing different behavior than the interpreter, who chooses the string version at runtime).

Implementation (including further aspects of semantics which require significant consideration in the implementation)

Platform targets

Embeddable via C on GNU/Linux
Can be made into shared libraries on GNU/Linux
Javascript
JVM (and particularly JVM Java)
CLR (and particularly CLR C#)
Python

Implementation written in Oot, mostly using Oot metaprogramming

This provides various benefits. First, it allows Oot to be efficiently improved, since we work on it in such a great high-level language. Second, it inspires us to create good metaprogramming facilities. Third, it ensures that the time spent working on the implementation is also time spent discovering Oot's warts, which should help improve the language in the early days. Fourth, it allows us to more easily port the implementation to new platforms. Fifth, in ensures that the entire Oot community is capable of understanding (and hence of contributing to) the Oot implementation.

Garbage collection

Oot is garbage-collected, as this increases succinctness and safety. The GC algorithm should be modular (e.g. the user can choose which garbage collector is used), but the default should emphasize low latency, e.g. very short stop-the-world pauses, at the expense of lower throughput (this increases safety at the cost of performance).

I think "concurrent" generational GCs are good for this? Also, i've never noticed any pauses with Python, which uses reference counting, even though i've heard that in theory a single deallocation could cause an unbounded pause as reference counts are recursively decremented (couldn't this be done in a separate thread, though, rather than all at once, with no pauses at all?) I've also heard that hybrid generational garbage collectors that combine tracing for young generations with reference counting for old ones may be a good idea (Ulterior Reference Counting? although the paper still shows significant maximium pause times there).

Continuations

Oot has continuations, as this increases extensibility.

Threading

Unlike Python, there is no Global Interpreter Lock; multiple Oot threads can coexist within the same process.

Oot has green threads/coroutines/goroutines (suitable for the creation of a massive number of threads).

true tail-call optimization?

Like Clojure, Oot may require a 'recur' directive to specify a tail call, it does not automatically detect and optimize tail calls. This is because if even Clojure felt that was a necessary concession to performance on contemporary VMs, then it probably is.

However, i've also heard that TCO is necessary for practical true continuations. i haven't thought through it yet myself. If so, then we'll have to do it.

Massive parallellism

I'd like Oot to be suitable for massive parallelism, e.g. the Connection Machine. I am intrigued by this architecture because:

i think the brain has a massively parallel architecture
parallel processing algorithms are complex with an intermediate number of processors, but if the number of processors is on the order of the number of data elements, they get simple again (these algorithms use the paradigm called "active data structures" in the book The Connection Machine)
the exponential increase in computer speed seems to have ended if interpreted in terms of increase of serial processing speed, but it still continues if interpreted in terms of number of CPUs times speed (e.g. total parallel operations per second)

However, due to my lack of experience programming in such situations i'm sure i'll get this wrong.

One canonical implementation

One theory for why Common Lisp did not take over the world is that the lack of a canonical implemntation caused splintering ( http://news.ycombinator.com/item?id=5031802 ). In other words, it is postulated that a canonical implementation greatly increases communal gravity.

Oot will have a single canonical implementation, which will be written in Oot; however, other implementations will be possible (just as Python has Jython).

Callable from C?

Not sure if we will support this. The idea is that we should be able to produce shared libraries that other programs written in C can use without knowing anything about Oot. Apparently many higher-level languages have trouble with this. I don't quite understand what is required in order to achieve this, so i can't decide if it's worth it. Some possible prerequisites that i can think of are:

Being able to input/output data in C-style representations. This seems doable.
Being able to manually allocate/deallocate memory, because the guy calling your library may not have the courtesy to call your garbage collector later (not sure why that should matter tho.. couldn't you just call the collector upon exit from your code? other that than, let the C program dellocate what you give them, that's what they have to do anyways, right?).
Not messing up the stack. Apparently some languages have 'segmented stacks' or 'split stacks' and apparently this may be an issue?

looking at Oot's 'multiple ortho bases' design goal, here is a note:

if something is a simple composition of two things, and those two things would make good bases by themselves, then have them separately and not together

and here are some examples:

NAND by itself or NOR by itself, plus constants, are a basis for boolean logic, but instead we use the AND, OR, NOT basis, and we augment it with XOR; because NOR(x,y) = not (AND(x,y)), and NAND(x,y) = not (OR(x,y)), which are just simple compositions of two things; but XOR is not a simple composition of only two nodes of AND, OR, NOT (eg XOR(x,y) = (x AND (NOT y)) OR ((NOT x) AND y), which is 5 intermediate nodes, not 2.
we have WHILE and SWITCH, rather than combining them into one construct such as SPIN Promela's 'do', because SPIN Promela's 'do' is just a WHILE loop around a SWITCH