proj-oot-ootNotes14


lisper 1 day ago

One cool thing about Lisp is that you can easily embed new languages in it, and those languages can be small and beautiful. For example, I have a Python-esque FOR macro that uses an iterator protocol, and a universal binding macro that subsumes all of Common Lisp's binding constructs (LET, LET*, LABELS, FLET, MULTIPLE-VALUE-BIND, etc.) So for me, Common Lisp has actually shrunk without losing any functionality. This is not possible in languages without macros. Such languages are indeed doomed to either grow forever, or change in non-backwards-compatible ways (e.g. Python3). But with macros you can shrink a language as well as grow it. This is one of the reasons Common Lisp continues to thrive.

[UPDATE]: You can find my code here: https://github.com/rongarret/ergolib

Also, I forgot to mention another language-shrinker included in that library: REF. REF is a universal de-referencer that subsumes NTH, ELT, SLOT-VALUE, GETHASH and probably a few other things that I can't remember right now. It also lets you build abstract associative maps (a.k.a. dictionaries) with interchangeable implementations (see the DICTIONARY module), which lets you get rid of ASSOC and GETF.

reply

my comments:


jlarocco 19 hours ago ... In Lisp, I can iterate and loop using the built-in "loop" construct, or I can use the more convenient control structures defined in the Iterate library.

sparkie 1 day ago

Haskell is really the kind of exploding language being discussed in the article though. While it has simple toggle switches for the end user, each of the underlying "extensions" is really a large modification to the compiler. There's still a monolithic parser, and authors of language extensions need to take into account all the other extensions to make sure they play nicely together. It's like they've heard of the open/closed principle but don't know how to apply it.

Specific extensions (like Safe/Trusted), or disabling record syntax etc are demonstrative of the problem - they're quite specific and implemented as part of the compiler because the language itself does not have the means of selectively enabling/disabling features via runtime code.

Kernel[1] has a much more interesting model based on first-class environments. One can selectively expose parts of the language to any execution context by creating a first-class environment containing only the required bindings, then execute some code in that environment. It provides a function ($remote-eval (code-to-exec) environment) for this purpose.

To give a very trivial example of this, lets say I want to expose only the numeric operators +, -, etc to some "safe calculator" context. I can simply bind these symbols to their kernel-standard-environment ones and be done.

      ($define! calc-env ($bindings->environment (+ +) (- -) (* *) ...)
      ($remote-eval (+ (* 2 3) 1) calc-env)

Trying to do something like unsafe "read-file" in place of (+ (* 2 3) 1) will result in a runtime error because the binding read-file doesn't exist in that environment.

There's much more interesting stuff to be found in Kernel if you like the "small core" based approach to computing. Kernel is more schemy than scheme. Compiler hacks like "macros" are an example of something you wouldn't want lying around in your exploding language when you can implement them trivially with first class operatives. And why would you want quote implemented as a language primitive? Yuck!

[1]:http://web.cs.wpi.edu/~jshutt/kernel.html

reply

lmm 22 hours ago

Not true. If a language has adequate general constructs then you can replace language features with ordinary code written in the language. E.g. in Scala no-one uses "return" any more, because you can get the same functionality (in a better / more consistent way) using library types - just ordinary classes with ordinary methods, no macros needed.

justthistime_ 11 hours ago

I think we definitely found the Common Lisper.

reply

lisper 22 hours ago

> in Scala no-one uses "return" any more

Huh??? Does Scala even have a "return" statement? I can't find it in the docs.

reply

draven 18 hours ago

There is: http://scala-lang.org/files/archive/spec/2.11/06-expressions...

The general consensus is: don't use it.

reply

lisper 12 hours ago

Ah, there it is. OK, so I'm still confused.

> you can get the same functionality (in a better / more consistent way) using library types

RETURN is a control construct. How do you emulate a control construct using types?

reply

lmm 19 hours ago

A lot of things that seem like they would need compiler support actually don't. Look at Spire.

reply

https://github.com/non/spire


PaulHoule? 1 day ago

In the case of Java it is not that Java is a "large language", it is that to get anything useful done with Java you need to know about Maven and Spring and Log4J and Apache Commons Logging and SLF4J (because if you're using a lot of libraries surely all of those will be in use.)

That is, it is the complexity of the ecosystem, not of the language.

reply

falcolas 1 day ago

When I was learning Java recently in anticipation of a job programming Java - I was surprised by this reality. The core of Java is remarkably simple to learn - there's not really all that much to it.

The complexity is indeed in all of the libraries and build frameworks and well intentioned but silly HammerFactoryFactoryFactoryFactories?.

reply

jerf 1 day ago

I think the case can be made that Java was too simple. It's inability to express very much within itself is what led to explosion of external tools to make it "better", or indeed, "work".

leoc 18 hours ago

> I think the case can be made that Java was too simple. It's inability to express very much within itself is what led to explosion of external tools to make it "better", or indeed, "work".

Yes, Java (IIRC) originally aspired to be a small, simple language with a few and honest constructs that everyone would understand and use. Of course—speaking of CL!—this is Greenspun's Tenth Law at work. To be fair, that's not necessarily to say that blowing the syntactic budget on a for construct is a good decision, and there are very good reasons to let language design take place in a marketplace of extensions rather than in a centrally-mandated core language. But if the idea is that if you mandate a simple language then the language as people use it will necessarily be simple and uniform, then no.

(Doing my crazy-man turn for moment: this is just one manifestation of a much wider problem. The idea that pushing unavoidable but unwelcome complexity (or unreliability or untrustworthiness, or things like only-partial support for interfaces) in-band is equivalent to making it somehow go away is the great all-pervading madness that afflicts computing. "As simple as possible, but no simpler"...)


reply

> That is, it is the complexity of the ecosystem, not of the language.

sbilstein 1 day ago

I think this is a very fair way to characterize Java's problems. The syntax is annoying but generally let's you get tons of shit done in a reasonable matter. It lacks the features of a stronger type system like Haskell or Scala, but you can get pretty far.

The ecosystem on the other hand can be totally befuddling, maven, gradle, the dozen or so DI frameworks and the various codebases that seem to use all of them, choosing between Apache or Google Java libs (or both!), etc.

Scala on the other hand suffers from an explosion of language features that means you either get a ton of shit done because you love Scala or you get nothing done. I'm not sure which is better anymore since I work in both on a daily basis but it's a different tradeoff.

reply

edwinnathaniel 1 day ago

> The syntax is annoying

That's a matter of perspective no? I'm not fond of symbols so reading Ruby super-terse code makes me choose watching a movie on Netflix over that.

> The ecosystem on the other hand can be totally befuddling

You mean slightly better than Python that keeps re-inventing the (half) wheel? :D

Maven is used by the majority projects with Android projects as an exception because Google pushed hard for Gradle.

For DI frameworks: Spring is the majority winner with Guice/CDI on the second place.

Apache vs Google Guava only because Guava came in late and both are just a nice small library (not a framework). Older code within the codebase might have already used Apache Common lib and newer code within the _same_ codebase will more likely use Guava where it is fit (I/O is an area where Apache has better library).

We should also compare this situation with various Auth & Auth lib for Rails/NodeJS? project :).

So shrug ... Java has been around longer, at most usually there are 2 competing libraries for certain area and the better ones tend to win (again, depend on your perspective what "better" means: some prefer Maven over Gradle).

reply

edwinnathaniel 1 day ago

Isn't that the same with _any_ ecosystem?

Ruby => Rails (most of the time...) Python => Django NodeJS? => ExpressJS?

Ruby => RubyGems? + Rake + Bundler Python => (finally something ... static) pip NodeJS? => NPM Browser JS => Bower

NodeJS? tries to be as simple as possible but at the end of the day, you need to use/download/learn libraries with different quality/documentation level and different API-feel/code-style.

reply

lmm 22 hours ago

Java libraries tend to have "magic" that alters the language semantics. E.g. Spring's dependency injection breaks your reasoning about how an object is constructed. Hibernate breaks your reasoning about when object fields can change. Tapestry breaks your reasoning about basically everything. There's a difference between a library that follows the rules of the language and a framework that changes them.

(admittedly to a certain extent I've heard the same said of rails)

reply

reilly3000 23 hours ago

Except it isn't. Attributes of languages include their documentation syntax and extension API. Getting this right makes a remarkable difference for how easily a person can grok a new library or extension and make it useful. A good language has a common "language" overall, not just code syntax.

reply


 bad_user 18 hours ago

> Scala on the other hand suffers from an explosion of language features

This isn't what I feel. Scala's feature set is small, but powerful. Some examples:

And then indeed, we can talk about things like pattern matching or case classes, which in my opinion add tremendous value. But you know, static languages need features in order to be usable / expressive and cannot be minimal in the way that Scheme or Smalltalk are. For example people complain about implicit parameters, however implicit parameters happen anyway in any language (e.g. undocumented dependencies, singletons) and at the very least in Scala you can document those dependencies in the function's or the constructor's signature and have it statically type-checked and overridable. Plus implicit parameters allow one to work with type-classes and compared to Haskell, in Scala a type-class is just a plain interface and its implementation is just a value. And also the CanBuildFrom? pattern is not a type-class and isn't possible in Haskell. So such a small feature such as implicit parameters yields tremendous power.

I could probably go on, just wanted to point out that Java's simplicity and at the same time Scala's complexity is entirely misleading. And also, I happened to introduce many rookies to Scala and by far the biggest hurdles are posed by exposure to new concepts or design patterns, brought by functional programming of course. Even explaining Future is problematic, a standard library thing that otherwise leaked into Java and many other languages as well.

reply


jdmichal 1 day ago

I would agree with you, except there's no way to fit type erasure into that sentiment. Type erasure is a complex solution to an easy problem, done purely out of laziness and a broken sense of what "backwards compatible" should mean. The moment you try to do anything "interesting" with generics, you realize the sham that they are and start passing around `Class<T>`, which is exactly what you would have done before generics anyway.

reply

fithisux 20 hours ago

Type erasure kills the language. Otherwise, it would have a bright future.

reply

---

orthecreedence 1 day ago

I think lisp could benefit from a small core and building out a standard library. You could pack all the features it needs (packaging, lexical/dynamic scoping (defvar), let/lambda, defun/defmacro, multiple values (via values, multiple-value-call), setf (w/ setf expansion), simple arithmetic, declare/declaim/proclaim, maybe a few more) into the core and have standard libraries: cl.bind (multiple-value-..., defparameter, etc), cl.math (sin, cos, etc), cl.clos, cl.collections (arrays, hash tables), cl.io, etc etc.

I think this would clean things up a lot, still preserve the spec (aside from documenting what's in which libs), and make things more approachable.

Shoving everything into the "common-lisp" package works but it's cumbersome and you have to have the entire language sitting there to use anything.

reply

---

tjr 1 day ago

I don't have the exact quote/source right here handy, but I believe that was Guy Steele's intention with Scheme.

reply

Jtsummers 1 day ago

https://en.wikipedia.org/wiki/Scheme_(programming_language)#...

This was a concept for R6RS (not sure what happened, apparently some controversy with it) and R7RS has (attempted? succeeded?) in going in this direction.

reply

duaneb 1 day ago

IIRC R6RS was deemed too modular for not much reason while abandoning backwards compatibility. Thus, R7RS was split into small/large specs, and largely builds on R5RS.

reply

bitwize 1 day ago

R6RS was the systemd of language standards. It went against the very philosophy of the language it purported to standardize, and was basically a prescriptive standard based on a few influential individuals' notion of what Scheme "should" be.

That's another reason why I remain unswayed in my detestation for systemd: I'd seen this movie before and I don't like how it ends.

reply

Blackthorn 1 day ago

I think nobody's efforts there have really caught on because Common Lisp as a community is hopelessly conservative.

At least Lisp as a whole isn't -- Clojure is a hell of a lot simpler than CL, though even it now has someone with a "let's separate things out into different packages" project (called Dunaj -- I actually think it's a pretty good idea, but nobody's really talked about it).

reply

bitwize 1 day ago

This exists; it's called the R7RS standard for Scheme. It basically divides Scheme into two languages: a small core with a basic module system, and a more extensive language with a robust standard library packaged as modules.

reply

---

chubot 1 day ago

I'm glad someone said this. I'm an occasional JavaScript? programmer, and I looked over ES6 last night and was surprised by how large it's become. And I learned that ES7 is already on the way.

That said, most of the features seem nice, and many are borrowed from stable languages like Python, so perhaps it's not too much. I'll have to try it and see.

It made me wonder what Crockford is up to, and what he thinks of this.

https://github.com/lukehoban/es6features

http://es6-features.org/

reply

mkozlows 1 day ago

Yeah, I share the linked author's opinion of ES6 -- it's good stuff, but it's also a dangerous direction.

Part of me thinks that maybe what's needed is an updated version of "use strict" -- "use es6" or whatever -- that would let you use the new features, but also prevent you from using some deprecated features, to keep the surface of the language somewhat smaller even as new stuff gets added.

reply

dangoor 17 hours ago

That was seriously considered some years back and thrown out as likely to cause poor adoption and poor intermingling of language features.

http://www.2ality.com/2014/12/one-javascript.html

reply

warfangle 1 day ago

It's expanding the language, but adding such sorely needed features.

I've been working in it for a little while now, and egads is it painful to go back. Block scoping, arrow functions, and destructured assignments are all a godsend.

reply

---

dschiptsov 1 day ago

The Arc language (which runs this site) is a remarkable attempt to "fix" what went wrong with CL. It lacks a decent runtime and native code compiler (it offloads everything to mzscheme, the way clojure did to JRE) but it is already more than a proof of concept.

The problem is that there is no more DoD? or other grants for creating new Lisps anymore (particularly due to Java mass hysteria and prevalence of packer's mentality).

BTW, making something similar to SBCL (everything written in itself, except for a tiny kernel written in C) for Arc (a core language, without the kitchen sink syndrome) is of moderate difficulty compared to meaningless piling up of more and more of Java crap.

reply

---

newuser88273 22 hours ago

Common Lisp actually has a core of a mere thirteen "special operators". You can think of everything else as standard library.

reply

---

Grue3 15 hours ago

Interestingly hardly anybody uses Algol, Smalltalk, Pascal and early Scheme anymore, while people still use Common Lisp. Perhaps "being small and beautiful" is actually a bad thing for a programming language?

reply

pjmlp 23 hours ago

The common fallacy about simple languages.

Yes the language might be simple to understand, but then the result is the complexity lands in the shoulders of developers and an ever increasing library of workarounds to compensate for missing features.

Hence why every simple language that achieves mainstream use, ends up becoming like the ones it intended to replace.

reply

raymondh 23 hours ago

Python also suffering in this regard. It has moved from being "a language that fits in your head" to a language where very few people on the planet know most of what's in it.

reply

upofadown 1 day ago

Whenever I encounter people arguing about Python3 I am reminded that I still miss Python1.

reply

ericbb 1 day ago

Here's a title for a rebuttal in case anybody wants to write it. ;)

The Tragedy of ISLISP, Or, Why Small Languages Implode

reply

---

 TazeTSchnitzel 1 day ago

ECMAScript 6 is a shame, there's a lot of stuff added which is unnecessary.

"let" is unnecessary. JS now has two kinds of variable scoping! "var"'s hoisting is annoying, sure, but we don't need two kinds of variable scope. If you want to scope something to a block, you can just use an IIFE.

"class" is unnecessary at best. JavaScript? has a bunch of ways of constructing objects to choose from, and that's not a problem. Why lock users into one paradigm, and obscure what's actually happening underneath? This will just confuse people when they have to deal with code that doesn't use "class" syntax or the OOP model it presents.

Object property shorthand is confusing. Why the hell is {bar} equivalent to {bar: bar}? Isn't that a set literal (Python, math)? Why isn't there the colon, if it's an object? What the hell? Try explaining that to newcomers.

Computed property names looks weird and is misleading. You'd logically expect {[1+1]:2} to be an object with an Array (coërced to string?) key, because [] is an Array literal. But instead it means "compute this expression". In which case, why isn't it ()? That's what you'd intuitively expect. I've tried to use () before and was surprised it didn't work, even.

Method properties, e.g. { foo(a,b) { ... } }, are unnecessary given => functions.

All that being said, I think ES6 has some quite positive additions. Maps, sets, tail-call elimination, =>, modules and symbols are all very important and useful features JavaScript? really needed.

reply

Arnavion 20 hours ago

>If you want to scope something to a block, you can just use an IIFE.

Now try doing that in a loop that you want to break out of. Edit: To save you the trouble - https://github.com/babel/babel/issues/644

>"class" is unnecessary at best. ... Why lock users into one paradigm?

It canonicalizes one of the popular ways of doing classes (the other being the same but without `new`).

>Why the hell is {bar} equivalent to {bar: bar}? Isn't that a set literal (Python, math)?

{ ... } in JS has never meant set literals. It does however mean objects (dictionary literals) which is also how Python uses it.

>You'd logically expect {[1+1]:2} to be an object with an Array (coërced to string?) key, because [] is an Array literal.

[] has also always been used to index objects and arrays, so using it when generating the object with keys and values follows as an extension of that.

>Method properties, e.g. { foo(a,b) { ... } }, are unnecessary given => functions.

Arrow functions capture lexical this, which method properties do not. Compare `({ x: 5, foo() { return this.x; } }).foo()` with `({ x: 5, foo: () => this.x }).foo()` Arrow functions also do not have an arguments object.

reply

TazeTSchnitzel? 14 hours ago

> Now try doing that in a loop that you want to break out of.

Fair point, although this can be worked around. Though it begs the question of why you need block scoping anyway. If you have a function large enough to need it, you should probably break it down into smaller functions, and compose them.

> It canonicalizes one of the popular ways of doing classes

But there are other popular ways, and this way new users will have the underlying details hidden from them, meaning they'll encounter problems later. It's also potentially misleading.

> { ... } in JS has never meant set literals.

Yes, but it's never been { a, b } - there's always been a colon. Python also uses {} for dictionaries, but with colons. Having { a } magically use the variable's name as a key name, and also use the variable's value, is unintuitive. { a, b } in another language would be an array (C, C++) or a set literal (Python, mathematics). Nobody would expect it to do what it does here in ES6.

> [] has also always been used to index objects and arrays, so using it when generating the object with keys and values follows as an extension of that.

I suppose that makes some sense, but we don't use [] for string keys in literals.

> Arrow functions capture lexical this, which method properties do not.

Oh, right, good point.

reply

my comments:

note my distate of the complexity of js es6 syntax


Picon the expressive power of ambiguity

> The answer denotes ambiguously. ;-) Ah, McCarthy?'s AMB operator, and non-deterministic lambda-calculus!

Did you know that in presence of AMB, call-by-name is basically useless (and macro-expressible with call-by-value), but is better generalized by call-by-future (that isn't macro-expressible with call-by-value, in absence of side-effects)? And that with both call-by-value and call-by-future, in a pure lambda-calculus, you can express the operational may-semantics of logic-and-functional programming?

And did you know that with such a system, you can express logical reflection whereby the system can express everything about the logical properties of programs in as expressive as you like an intuitionnistic logic? And that with such logical reflection, you can program-by-specification, asking the system to automatically find you a program that satisfies your logical specification? Of course, the obvious naive implementation that iterates blindly over proofs won't usually give you an answer within any reasonable amount of time.

My oh my, why didn't I ever take the time to make my paper accepted by a journal? But is there any journal interested in such things? Or are they too well-known already?

[ François-René ÐVB Rideau

Reflection&Cybernethics http://fare.tunes.org ]

---

do we want these?

---

ok, how about: the optional/keyword return values of a function ARE the additional facts; the keywords are the variables being bound and the values returned are the values to be bound; the primary return value is not a fact but just a value; so the primary return value is distinguished

---

this Paul Graham quote is interesting:

" A friend of mine who knows nearly all the widely used languages uses Python for most of his projects. He says the main reason is that he likes the way source code looks. That may seem a frivolous reason to choose one language over another. But it is not so frivolous as it sounds: when you program, you spend more time reading code than writing it. You push blobs of source code around the way a sculptor does blobs of clay. So a language that makes source code ugly is maddening to an exacting programmer, as clay full of lumps would be to a sculptor.

At the mention of ugly source code, people will of course think of Perl. But the superficial ugliness of Perl is not the sort I mean. Real ugliness is not harsh-looking syntax, but having to build programs out of the wrong concepts. Perl may look like a cartoon character swearing, but there are cases where it surpasses Python conceptually. "

does this support or oppose the hypothesis that syntax matters (as opposed to " Choice of lexical syntax is arbitrary, uninteresting, and quite often distracts from actual substance in comparative language discussion. If there is one central theme is that the design of the core calculus should drive development, not the frontend language." [1])? The first paragraph seems to support this. But in the second paragraph, Graham says he was talking about conceptual (semantic) ugliness, not syntactic.

However, Graham himself, and the 'friend of mine' whom he presents as evidence, have different preferences. The 'friend of mine' prefer Python. Graham prefers Lisp. So it seems likely that the friend really does care about syntax, whereas Graham does not.

---

so we have:

firstclass facts and mutations, firstclass side-effects, firstclass patterns, firstclass variables, firstclass fexprs

---

" When it comes to writing code, the number one most important skill is how to keep a tangle of features from collapsing under the weight of its own complexity. I’ve worked on large telecommunications systems, console games, blogging software, a bunch of personal tools, and very rarely is there some tricky data structure or algorithm that casts a looming shadow over everything else. But there’s always lots of state to keep track of, rearranging of values, handling special cases, and carefully working out how all the pieces of a system interact. To a great extent the act of coding is one of organization. Refactoring. Simplifying. Figuring out how to remove extraneous manipulations here and there. " -- http://webcache.googleusercontent.com/search?strip=1&q=cache:http%3A%2F%2Fwww.johndcook.com%2Fblog%2F2015%2F06%2F18%2Fmost-important-skill-in-software%2F

---

" app.get('/', (req, res) => { try { await user = db.get({user: req.user}); res.send({user: user}); } catch { res.send(400); } }); " -- http://lebo.io/2015/03/02/steve-yegges-next-big-language-revisited.html

---

" import List(partition)

  qs [] = []
  qs (pivot:rest) = qs smaller ++ pivot : qs larger
      where (smaller,larger) = partition (< pivot) rest

Haskell 98, the (polymorphic) type is "Ord a => [a] -> [a]"

"

---

WS* standards can be quite complicated to use for people that don't know them. The suggestion to use REST may work depending on your requirements - if you don't need security, strong typing, build in validation, sessions and so on. If you do, or you have a heavy investment in SOAP web services already then using these optional standards is the way to go – blowdart May 13 '09 at 10:45

---

lmm 6 hours ago

The big problem I've found is serialization of sum types. thrift (or protobuf or any number of similar systems) is very good at serializing most of the data one tends to work with, with declarations that are suitably strict but easy to write. But it doesn't have a good way to represent "this field is an A or a B".

reply

coderjames 2 hours ago

Protobuf has somewhat recently added the 'oneof' feature which tries to deal with these union types.

The one downside I found when trying to use it was that the 3rd-party C# Protobuf libraries didn't yet support it.

See: https://developers.google.com/protocol-buffers/docs/proto#on...

reply

---

 eli_gottlieb 4 hours ago

>On the other hand, creating systems from programms written by different people requires more flexibile interfaces that can deal with versioning, backward and forward compatibility, etc, something that is extremely difficult to do across programming languages without heaping on massive complexity (CORBA, WS-deathstar, ...)

Personally, I think the LLVM project shows what the Right Thing is here: package most of the real functionality into libraries while putting the user interfaces into runnable programs. Everyone who wants to communicate with the functionality can just link the library.

And if we want to do it statefully, well, handling global state across a large system is basically the unsolved problem of programming.

reply

---

"Functions and programs are certainly not an either-or kind of thing but making them one and the same would require a very different kind of operating system than we have ever seen. Say, emacs OS :)"


Guido likes functions to return from only one place, eg not use Exceptions to return ordinary values:

"I button-holed Guido and pressed him to tell me why he thought raising Response objects was a bad idea. “It violates the principle that you should return from only one place,”"


questions from http://www.cl.cam.ac.uk/~pes20/cerberus/notes51-2015-06-21-survey-short.html

" [1/15] How predictable are reads from padding bytes?

If you zero all bytes of a struct and then write some of its members, do reads of the padding return zero? (e.g. for a bytewise CAS or hash of the struct, or to know that no security-relevant data has leaked into them.)

...

It remains unclear what behaviour compilers currently provide (or should provide) for this. We see four main alternatives:

a) Structure copies might copy padding, but structure member writes never touch padding.

b) Structure member writes might write zeros over subsequent padding.

c) Structure member writes might write arbitrary values over subsequent padding, with reads seeing stable results.

d) Padding bytes are regarded as always holding unspecified values, irrespective of any byte writes to them, and so reads of them might return arbitrary and unstable values (in which case the compiler could arbitrarily write to padding at any point, as that would be masked by this).

...

[2/15] Uninitialised values

Is reading an uninitialised variable or struct member (with a current mainstream compiler):

(This might either be due to a bug or be intentional, e.g. when copying a partially initialised struct, or to output, hash, or set some bits of a value that may have been partially initialised.)

a) undefined behaviour (meaning that the compiler is free to arbitrarily miscompile the program, with or without a warning), ... d) going to give an arbitrary but stable value (with the same value if you read again).

... The survey responses are dominated by the (a) "undefined behaviour" and (d) "arbitrary but stable" options.

...

[3/15] Can one use pointer arithmetic between separately allocated C objects?

If you calculate an offset between two separately allocated C memory objects (e.g. malloc'd regions or global or local variables) by pointer subtraction, can you make a usable pointer to the second by adding the offset to the address of the first?

[4/15] Is pointer equality sensitive to their original allocation sites?

For two pointers derived from the addresses of two separate allocations, will equality testing (with ==) of them just compare their runtime values, or might it take their original allocations into account and assume that they do not alias, even if they happen to have the same runtime value? (for current mainstream compilers)

[5/15] Can pointer values be copied indirectly?

Can you make a usable copy of a pointer by copying its representation bytes with code that indirectly computes the identity function on them, e.g. writing the pointer value to a file and then reading it back, and using compression or encryption on the way?

[6/15] Pointer comparison at different types

Can one do == comparison between pointers to objects of different types (e.g. pointers to int, float, and different struct types)?

[7/15] Pointer comparison across different allocations

Can one do < comparison between pointers to separately allocated objects?

[8/15] Pointer values after lifetime end

Can you inspect (e.g. by comparing with ==) the value of a pointer to an object after the object itself has been free'd or its scope has ended?

[9/15] Pointer arithmetic

Can you (transiently) construct an out-of-bounds pointer value (e.g. before the beginning of an array, or more than one-past its end) by pointer arithmetic, so long as later arithmetic makes it in-bounds before it is used to access memory?

[10/15] Pointer casts

Given two structure types that have the same initial members, can you use a pointer of one type to access the intial members of a value of the other?

[11/15] Using unsigned char arrays

Can an unsigned character array be used (in the same way as a malloc’d region) to hold values of other types?

[12/15] Null pointers from non-constant expressions

Can you make a null pointer by casting from an expression that isn't a constant but that evaluates to 0?

[13/15] Null pointer representations

Can null pointers be assumed to be represented with 0?

[14/15] Overlarge representation reads

Can one read the byte representation of a struct as aligned words without regard for the fact that its extent might not include all of the last word?

[15/15] Union type punning

When is type punning - writing one union member and then reading it as a different member, thereby reinterpreting its representation bytes - guaranteed to work (without confusing the compiler analysis and optimisation passes)?

...

There were many comments here which are hard to summarise. Many mention integer overflow and the behaviour of shift operators. "

pcwalton 13 hours ago

Well, one of the interesting takeaways from the article is that C as specified often does not let you take advantage of your system architecture. The specification has things like GPUs and segmented memory architectures in mind when it forbids you from doing seemingly reasonable things like taking the difference between the addresses of two separately-allocated objects, even though chances are very good what you're trying to do works just fine on all architectures you care about.

reply

... I don't mean that sarcastically; pcwalton's sibling message only begins to mention the ways in which C is not a match to modern systems. Vector processing, NUMA, umpteen caching layers, CPU features galore... it's not really a match to the "architecture" any more.

arielby 13 hours ago

C matches NUMA and caching just as well as assembly does. Basically the only thing C doesn't really expose is SIMD, but compiling "high-level" (i.e. C-level or above) code into SIMD is an open research problem (AFAIK) - I mean, you can easily compile array programs into SIMD, but the more general case is still open.

reply

if pointer arithmetic and knowledge of byte representation (including alignment and padding) of other types is disallowed, the following are left:

" [2/15] Uninitialised values

Is reading an uninitialised variable or struct member (with a current mainstream compiler):

(This might either be due to a bug or be intentional, e.g. when copying a partially initialised struct, or to output, hash, or set some bits of a value that may have been partially initialised.)

a) undefined behaviour (meaning that the compiler is free to arbitrarily miscompile the program, with or without a warning), ... d) going to give an arbitrary but stable value (with the same value if you read again).

... The survey responses are dominated by the (a) "undefined behaviour" and (d) "arbitrary but stable" options.

...

[5/15] Can pointer values be copied indirectly?

Can you make a usable copy of a pointer by copying its representation bytes with code that indirectly computes the identity function on them, e.g. writing the pointer value to a file and then reading it back, and using compression or encryption on the way?

[6/15] Pointer comparison at different types

Can one do == comparison between pointers to objects of different types (e.g. pointers to int, float, and different struct types)?

[7/15] Pointer comparison across different allocations

Can one do < comparison between pointers to separately allocated objects?

[8/15] Pointer values after lifetime end

Can you inspect (e.g. by comparing with ==) the value of a pointer to an object after the object itself has been free'd or its scope has ended? "

in summary:

my guess for oot:

does reading from an uninitialized value cause an error, or return an arbitrary but stable value?

if you serialize the representation of a pointer and then unserialize it, what happens?

can you compare values of different types for pointer equality? can you compare for structural equality?

can you do < comparison of pointers (even across separately allocated objects)?

"inspect (e.g. by comparing with ==) the value of a pointer to an object after the object itself has been free'd or its scope has ended?"

---

http://blog.regehr.org/archives/1180

"

    The value of a pointer to an object whose lifetime has ended remains the same as it was when the object was alive.
    Signed integer overflow results in two’s complement wrapping behavior at the bitwidth of the promoted type.
    Shift by negative or shift-past-bitwidth produces an unspecified result.
    Reading from an invalid pointer either traps or produces an unspecified value. In particular, all but the most arcane hardware platforms can produce a trap when dereferencing a null pointer, and the compiler should preserve this behavior.
    Division-related overflows either produce an unspecified result or else a machine-specific trap occurs.
    If possible, we want math- and memory-related traps to be treated as externally visible side-effects that must not be reordered with respect to other externally visible side-effects (much less be assumed to be impossible), but we recognize this may result in significant runtime overhead in some cases.
    The result of any signed left-shift is the same as if the left-hand shift argument was cast to unsigned, the shift performed, and the result cast back to the signed type.
    A read from uninitialized storage returns an unspecified value.
    It is permissible to compute out-of-bounds pointer values including performing pointer arithmetic on the null pointer. This works as if the pointers had been cast to uintptr_t. However, the translation from pointer math to integer math is not completely straightforward since incrementing a pointer by one is equivalent to incrementing the integer-typed variable by the size of the pointed-to type.
    The strict aliasing rules simply do not exist: the representations of integers, floating-point values and pointers can be accessed with different types.
    A data race results in unspecified behavior. Informally, we expect that the result of a data race is the same as in C99: threads are compiled independently and then data races have a result that is dictated by the details of the underlying scheduler and memory system. Sequentially consistent behavior may not be assumed when data races occur.
    memcpy() is implemented by memmove(). Additionally, both functions are no-ops when asked to copy zero bytes, regardless of the validity of their pointer arguments.
    The compiler is granted no additional optimization power when it is able to infer that a pointer is invalid. In other words, the compiler is obligated to assume that any pointer might be valid at any time, and to generate code accordingly. The compiler retains the ability to optimize away pointer dereferences that it can prove are redundant or otherwise useless.
    When a non-void function returns without returning a value, an unspecified result is returned to the caller. 

"

again, for a language without pointer arithmetic and with opaque byte representation (including alignment and padding) of all types, and with autoinitialization, the following are left:

    The value of a pointer to an object whose lifetime has ended remains the same as it was when the object was alive.
    Signed integer overflow results in two’s complement wrapping behavior at the bitwidth of the promoted type.
    Shift by negative or shift-past-bitwidth produces an unspecified result.
    Reading from an invalid pointer either traps or produces an unspecified value. In particular, all but the most arcane hardware platforms can produce a trap when dereferencing a null pointer, and the compiler should preserve this behavior.
    Division-related overflows either produce an unspecified result or else a machine-specific trap occurs.
    If possible, we want math- and memory-related traps to be treated as externally visible side-effects that must not be reordered with respect to other externally visible side-effects (much less be assumed to be impossible), but we recognize this may result in significant runtime overhead in some cases.
    The result of any signed left-shift is the same as if the left-hand shift argument was cast to unsigned, the shift performed, and the result cast back to the signed type.
    A data race results in unspecified behavior. Informally, we expect that the result of a data race is the same as in C99: threads are compiled independently and then data races have a result that is dictated by the details of the underlying scheduler and memory system. Sequentially consistent behavior may not be assumed when data races occur.
    memcpy() is implemented by memmove() (the destination can overlap the source) [http://stackoverflow.com/questions/1201319/what-is-the-difference-between-memmove-and-memcpy]). Additionally, both functions are no-ops when asked to copy zero bytes, regardless of the validity of their pointer arguments.
    The compiler is granted no additional optimization power when it is able to infer that a pointer is invalid. In other words, the compiler is obligated to assume that any pointer might be valid at any time, and to generate code accordingly. The compiler retains the ability to optimize away pointer dereferences that it can prove are redundant or otherwise useless.
    When a non-void function returns without returning a value, an unspecified result is returned to the caller. 

my views on these for oot:

?not sure, todo: The value of a pointer to an object whose lifetime has ended remains the same as it was when the object was alive.

Signed integer overflow throw an exception, or (if unchecked) result in two’s complement wrapping behavior at the bitwidth of the promoted type.

Shift by negative or shift-past-bitwidth throws an exception, or (if unchecked) produces an unspecified result.

Reading from an invalid pointer throws an exception

Division-related overflows throw an exception or (if unchecked) produces an unspecified result.

math- and memory-related traps and all exceptions should be treated as externally visible side-effects that must not be reordered with respect to other externally visible side-effects (much less be assumed to be impossible)

The result of any signed left-shift is the same as if the left-hand shift argument was cast to unsigned, the shift performed, and the result cast back to the signed type.

the result of a data race is...: threads are compiled independently and then data races have a result that is dictated by the details of the underlying scheduler and memory system. Sequentially consistent behavior may not be assumed when data races occur.

memcpy() is not implemented by memmove() (the destination can NOT overlap the source) [2]). memcpy is a no-op when asked to copy zero bytes, regardless of the validity of their pointer arguments.

The compiler is granted no additional optimization power when it is able to infer that a pointer is invalid. In other words, the compiler is obligated to assume that any pointer might be valid at any time, and to generate code accordingly. The compiler retains the ability to optimize away pointer dereferences that it can prove are redundant or otherwise useless. (why? example: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/27/10537746.aspx ; in this example, the compiler decides to treat a use of a potentially invalid pointer as an assertion that the pointer is valid, and so then optimizes away other checks for the validity of the same pointer; see also [3])

When a non-void function returns without returning a value, NIL is returned to the caller.

IN GENERAL, PREFER RETURNING AN UNSPECIFIED RESULT OR AN EXCEPTION TO UNDEFINED BEHAVIOR, since 'undefined behavior' includes weird things like assuming that the invalid condition can't be reached, analyzing what must be true in order for that to be true, then optimizing other code paths away based on those assumptions.

---

on invalid pointers: http://c2.com/cgi/wiki?NoPointers

---

C++ references vs pointers:

https://www.preney.ca/paul/archives/1051

C11:

"It is unspecified whether or not a reference requires storage." (§8.3.2; para. 4)

"There shall be no references to references, no arrays of references, and no pointers to references." […] (§8.3.2; para. 5)

"[…] A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program […]" [§8.3.2; para. 5]

pointers can point to anything except references and bit fields [§8.3.1, para. 4].

"These restrictions... allows for the optimizing compiler to more intelligently optimize code." -- https://www.preney.ca/paul/archives/1051

---

need a primitive to actually erase memory, o/w memory erasure might be optimized away if the value being erased is never read again

(thanks [4])

---

DSMan195276 306 days ago

The kernel uses -fno-strict-aliasing because they can't do everything they need to do by adhering strictly to the standard, it has nothing to do with it being to hard (The biggest probably being treating pointers as integers and masking them, which is basically illegal to do with C).

IMO, this idea would make sense if it was targeted at regular software development in C (And making it easier to not shoot yourself in the foot). It's not as useful to the OS/hardware people though because they're already not writing standards-compliant code nor using the standard library. There's only so much it can really do in that case without making writing OS or hardware code more annoying to do then it already is.


" Guaranteed immediate trapping might be difficult on some platforms, but a specification that allows to return an unspecified value or trap immediately - that can be done anywhere. "

cousin_it 306 days ago

Sometime ago I came up with a simpler proposal: emit a warning if UB exploitation makes a line of code unreachable. That refers to actual lines in the source, not lines after macroexpansion and inlining. Most "gotcha" examples with UB that I've seen so far contain unreachable lines in the source, while most legitimate examples of UB-based optimization contain unreachable lines only after macroexpansion and inlining.

Such a warning would be useful in any case, because in legitimate cases it would tell the programmer that some lines can be safely deleted, which is always good to know.

regehr 306 days ago

The linked articles by Chris Lattner explain why this is very difficult.

http://blog.llvm.org/2011/05/what-every-c-programmer-should-... http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html

the problem is basically two things:

(on the first of these, a sample hypothetical unhelpful error message is given:

"warning: after 3 levels of inlining (potentially across files with Link Time Optimization), some common subexpression elimination, after hoisting this thing out of a loop and proving that these 13 pointers don't alias, we found a case where you're doing something undefined. This could either be because there is a bug in your code, or because you have macros and inlining and the invalid code is dynamically unreachable but we can't prove that it is dead. " )


so, autoinitialize variables to type-specific default value? or, compile-time error for potential access to undefined variable? the latter


http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html goes into detail about how allowing undefined behavior (or at least undefined value return) makes C faster:

Use of an uninitialized variable: no need to autoinitialize

Signed integer overflow: can optimize eg "X*2/2" to "X"; " While these may seem trivial, these sorts of things are commonly exposed by inlining and macro expansion.". Another example: "for (i = 0; i <= N; ++i) { ... }"; can assume that the loop will iterate exactly n+1 times (as opposed to i maybe wrapping if N is large)

Oversized Shift Amounts: different platform do different things in this case, so leaving this an undefined result lets the compiler just let the platform decide, rather than putting in extra instructions

Dereferences of Wild Pointers and Out of Bounds Array Accesses: "To eliminate this source of undefined behavior, array accesses would have to each be range checked, and the ABI would have to be changed to make sure that range information follows around any pointers that could be subject to pointer arithmetic"

Dereferencing a NULL Pointer: "dereferencing wild pointers and the use of NULL as a sentinel. NULL pointer dereferences being undefined enables a broad range of optimizations: in contrast, Java makes it invalid for the compiler to move a side-effecting operation across any object pointer dereference that cannot be proven by the optimizer to be non-null. This significantly punishes scheduling and other optimizations. In C-based languages, NULL being undefined enables a large number of simple scalar optimizations that are exposed as a result of macro expansion and inlining."

Violating Type Rules: It is undefined behavior to cast an int* to a float* and dereference it (accessing the "int" as if it were a "float"). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results.... (eg an example is given where a loop that zeros memory is replaced by a call to a platform primitive for this purpose)


a summary of my conclusions from the last few entries regarding undefined behavior and safety for oot is now in oot.txt:Invalid behavior


Carmack:

https://groups.google.com/forum/#!msg/racket-users/RFlh0o6l3Ls/8InN7uz-Mv4J

" I am using Racket http://racket-lang.org/ for the PC development environment, and Chibi Scheme for the embedded interpreter on Android. Download Racket and walk through their quick-intro-with-pictures to get a feel for the language and you should be able to hack on the sample scripts I have been writing pretty quickly. Note that Racket is a very extended Scheme targeted at desktop apps, the embedded version is just the basic standard. "

... " I am a big believer in functional programming (and static types) for large projects, but there is an undeniable bit of awkwardness compared to just imperatively poking things for small projects. That is one of the wins for Scheme -- I can make it super-easy to get easy things working, but it isn't just a "scripting language" unsuitable for large scale development. I am going to have to sort out my Racket / Chibi module strategy sometime soon, though.

As far as language choice goes, I don't claim to have broadly evaluated every possibility and chosen the optimal one.

Java or C# would have been more familiar to a broader base of game industry developers, but I really didn't want to drag in all the bulk of a JVM / .NET system, and a class focused world view seems less suited for the smaller scripting tasks.

Javascript would have been more familiar to a broader base of web industry developers, but I have basically no experience with javascript, and little desire to explore it (which is admittedly a fault of mine).

S-expression reading and writing is a strong factor for network communication, and I like the fact that there are available options for Scheme to interpret / compile / compile-to-C. I can see valid use cases for all of them, and I'm not sure how important each will be.

The bottom line is that I have been enjoying myself working with Racket / Scheme this year, and I have evidence that it has been objectively productive for me, so I'm going out on a bit of a limb and placing a bet on it. "

 	Gabriel Laddel 	6:17 AM (2 hours ago) Other recipients: jo...@oculus.com, us...@racket-lang.org 

While I'm happy to see John Carmack using a Lisp, you're missing the point. If you offer Lisp as a "scripting language" you'll fail to show off why *exactly* it's better. S-expressions shine because they save you from the entirely mechanical task of parsing. There are emergent properties of this notational scheme that are also lovely in their own right e.g., macros & extensibility via incremental compilation, but all of this originates in the notation.

If you build up a scripting language in which you pass around shader strings and whatever random formats happen into your program, you're going to destroy the underlying "lispyness" because anyone who wants to make a change to the implementation will have to understand the format in its entirety, or be forced to guess at the semantics implied by so-and-so syntax - at which point, what has Lisp gotten me? Sure, you'll have macros for the "high level language", but they're not magic. The reason that e.g., the Symbolics Lisp machine is so fondly remembered was because of the "crystalline pyramid of comprehensible mutually-interlocking concepts" (http://www.loper-os.org/?p=42) which allowed a user not intimately familiar with the machine or its architecture to make meaningful changes.

"Lisping" without addressing the underling problems of OpenGL? etc. is an absurd exercise. Even if this stunt were to make you personally a billon dollars, you'll end up unable to buy anything interesting because everyone's time is being spent ferrying around magic strings (where we are today).

Matthias Felleisen 7:06 AM (2 hours ago) Other recipients: a...@rocketsail.com, jo...@oculus.com, us...@racket-lang.org

On Jul 1, 2015, at 7:07 AM, AJ Campbell <a...@rocketsail.com> wrote:

> JSON is probably going to be the go-to format to send/receive renderable 3D packets. The thought of doing it with XML makes me feel ill. I'm sure Racket can handle JSON data (it very well might already for all I know),

It does of course.

> but Javascript found its way across the whole stack (among other reasons) because we love the idea of having a universal language to eliminate the serialization between client and server, plus it knocks down communication barriers between front-end and back-end team members.

That's actually a problem. When we launched Racket, Matthew insisted from the get-go that office partners do not communicate orally about things. Pop quiz: why was he right?

In my experience teaching courses where I give free choice of PLs for a distributed system, few languages make it easy for seniors to implement cross-process/network communication right. I always implement the systems by myself and let students vote to which format they want to stick: S-expressions (1958, but still better than what people invent now), XML, or JSON. For me, it's a two-line switch. For those on the losing end, it's a catastrophe. Pop quiz: why?

 	John Carmack 	7:47 AM (1 hour ago) Other recipients: gabrielva...@gmail.com, us...@racket-lang.org While I got tagged as a "technical idealist" for a long time, in reality I am deeply pragmatic. I fantasize about building a "crystalline pyramid of comprehensible mutually-interlocking concepts", but I know the folly of it. I have a not-so-large number of hours that I can possibly devote to this before it needs to stand on its own and provide real value to other developers, which means that it needs to be built on, and leverage, existing systems, warts and all.

S-expressions actually are one of the core wins from my use of lisp so far -- embracing read/write (and the associated bandwidth cost) as a wire protocol over yet another hand crafted binary format has been a significant win (however, the flexibility of the win seems to fight with static typing, perhaps at a fundamental level, in my limited typed-racket experience so far -- a good topic for discussion?).

Most users of this particular system will not interact with OpenGL? at all, they will just move pre-built models around the world, play sound effects, start movie clips, etc. The fact that I could easily add the ability to hack on shader code, and soon geometry generation, is a huge pragmatic win.

I really do want to hear suggestions, do you have some concrete directions that you think would be useful?

I'm only three weeks into this project. Give me a little more time to change the world. :-)

carloscm 6 hours ago

LuaJIT? is, by far, the best option for scripting in games/RT, thanks to the incredible JIT performance and its incremental GC with deadlines.

But there's something when you start playing with Lisp and then you want to keep using it more and more. Suddenly the classic Ruby/JS/Python/Lua language design feels boring and stale (ironically, given the age of Lisp).

After getting my feet wet in v2, I'm doubling down on Scheme for The Spatials v3, this time using S7 Scheme, a true gem of an interpreter (first class environments, hook on symbol-access for reactive programming, super extensive C FFI support, etc.)

reply

jarcane 4 hours ago

Going from Python to Lisp to Racket was an absolutely mind-blowing experience.

It literally changed my life, my perspective on programming, even what jobs I sought out.

reply

---

random parsing algorithm

https://jeffreykegler.github.io/Marpa-web-site/