Bayle Shanks's website: lists-programmingConstructs2

---

" taw 4 points5 points6 points 3 years ago[-]

I haven't used Haskell type system goodies (like type classes) that much, but OCaml type system is anything but "very expressive".

OCaml doesn't even let (fun a b -> a + b) apply to a pair of floats, let alone your own types.

You want 2d point and 3d point records and be able to say (a.x, a.y, a.z) and (b.x, b.y) ? In OCaml names of record fields must be unique inside the whole program, so the best you can do is (a.point3d_z, b.point2d_x).

These two things work fine in Haskell, Nemerle and even SML.

Or the other way around. OCaml is smart enough to know that (Printf.sprintf "%s <%d, %d>") is a function that takes a string and two integers and returns a string, while SML and Haskell have absolutely no way of expressing something like that.

Another example. Let's try to write a function that takes a function that does something with lists (like identity or reverse or length) and applies it to two different lists:

    (fun f -> (f [1; 2; 3], f ["a"; "b"]));; It is a perfectly good function, but OCaml won't let it compile because its type system isn't good enough. It won't even let us explicitly type f (well there's always Obj.magic that ignores type system completely, but OCaml like C has zero runtime checks, so if we make a mistake it segfaults, istead of raising a pretty exception with full stack trace)

Haskell won't let us compile this either:

    (\f -> (f [1, 2, 3], f ["a", "b"]))

Neither will Nemerle:

    fun(f) { (f([1, 2, 3]), f(["a", "b"])) }

What the heck is going on ? Well, of course they have a pretty good excuse for this case - the most generic shape of type term for f cannot be uniquely determined, and unification-based type inference engines (SML, OCaml, Haskell etc.) for fundamental reasons cannot work with such cases. Nemerle uses constraint solver-based type inference engine, which is a bit more powerful, but doesn't handle this case (yet ?). No matter which language you'll take, the type system will keep getting in your way, and perfectly valid programs will refuse to compile.

Static typing and type inference may still be well worth it, I'm not saying they're not. But "very expressive" ? Come on.

permalink
parent
reportare you sure? yes / no
reply

sleepingsquirrel 3 points4 points5 points 3 years ago[+] (0 children)

sleepingsquirrel 3 points4 points5 points 3 years ago[-]

    OCaml is smart enough to know that (Printf.sprintf "%s <%d, %d>") is a function that takes a string and two integers and returns a string, while SML and Haskell have absolutely no way of expressing something like that.

Text.Printf

    Haskell won't let us compile this either: (\f -> (f [1, 2, 3], f ["a", "b"]))

Try with "ghc -fglasgow-exts" or "hugs -98"...

main = print $ f reverse

f :: (forall a. ([a]->[a])) -> ([Integer],[Char]) f x = (x [1,2,3], x "abc")

permalink
parent
reportare you sure? yes / no
reply

Excedrin 1 point2 points3 points 3 years ago[+] (3 children)

Excedrin 1 point2 points3 points 3 years ago[-]

    Or the other way around. OCaml is smart enough to know that (Printf.sprintf "%s <%d, %d>") is a function that takes a string and two integers and returns a string, while SML and Haskell have absolutely no way of expressing something like that.

In my opinion, the functional unparsing http://www.brics.dk/RS/98/12/ style printf is much better than the OCaml or SML/NJ printf library functions. Here's an SML implementation of the idea: http://mlton.org/Printf

permalink
parent
reportare you sure? yes / no
reply

taw 0 points1 point2 points 3 years ago[+] (2 children)

taw 0 points1 point2 points 3 years ago[-]

Internally OCaml stdlib (not the language) converts format to something like the thing you linked to. It's somewhat more readable and more convenient to write:

    (sprintf "<%f, %f, %f>" a.x a.y a.z)

than:

    (sprintf &#x60;"<"F&#x60;", "F&#x60;", "F&#x60;">" $ a.x a.y a.z)

And either of them is far better than (printf-less OCaml):

    ("^" ^ (string_of_float a.x) ^ ", " ^ (string_of_float a.y) ^ ", " ^ (string_of_float a.z) ^ ">")

Maybe it's a small thing, but it's very common. If you add enough such small things, the difference can get really huge.

permalink
parent
reportare you sure? yes / no
reply

Excedrin 1 point2 points3 points 3 years ago[+] (1 child)

Excedrin 1 point2 points3 points 3 years ago[-]

There's an OCaml implementation here:

http://tkb.mpl.com/~tkb/software.html

Also, a Haskell version:

http://www.cs.nott.ac.uk/~gmh/wgp01/hinze-paper.pdf

My point was more that "... SML and Haskell have absolutely no way of expressing something like that" isn't exactly correct for some values of "something like that." Also, a format string isn't exactly the cleanest solution based on my idea of clean staticly typed functional language code.

I'm glad that someone else has already figured out "~:D" and "~S" and they're part of the spec, documented and ready to use. Maybe it's a small thing, but it's very common. If you add enough such small things, the difference can get really huge.

permalink
parent
reportare you sure? yes / no
reply

taw 0 points1 point2 points 3 years ago[+] (0 children)

taw 0 points1 point2 points 3 years ago[-]

Oh well, I guess it's good enough.

A major difference is that while Haskell solution only does verification at run time (not that I really mind):

    "Mismatch between the argument types and the format string will cause an exception to be thrown at runtime."

OCaml does so at compile time, by smoke and mirrors :-)

    Printf.sprintf "%s<%d, %d>";;

: string -> int -> int -> string = <fun>
permalink
parent
reportare you sure? yes / no
reply

http://www.brics.dk/RS/98/12/

toread: http://nemerle.org/blog/archive/2005/Mar-13.html

toread: http://www.paulgraham.com/arclessons.html

http://www.paulgraham.com/arclessons.html : "Implicit local variables conflict with macros."

toread: http://www.reddit.com/r/programming/search?q=erlang&sort=top

libraries: core need to use to implement the compiler or interpreter? or mb just really important -- everyone should kno really well -- consider these part of the language base libs that are not core but still almost part of the language. available on any (normal) installation. ppl should become familiar with them incorporated but not shipped some libs may be "owned" by the main team but not available on any (normal) installation. recommended this category is lacking in many languages. these are libs that are not necc shipped w/ the language, not owned by the main team, mb not as stable, but which are officially blessed. the idea isn't that the main team strongly endorses the current state of the lib, but rather that they say, 'ppl r using this, this seems to be the frontrunner for what it does, you should be aware that it's out there and consider using it'. like gvr blessing django.

i don't understand this: http://archive.netbsd.se/?ml=haskell-libraries&a=2007-07&m=4678486

random guy's idea, toread. one idea is that "start" and "stop" messages are basic.

http://rebelscience.blogspot.com/2008/10/cosa-new-kind-of-programming-part-iii.html

decided not to do this:

 An exception to this equivalence (and to right-associativity) is that if the function as written takes k arguments, but it returns a function, then you are not allowed to write it with than k arguments next to it; the extra arguments must be separated from the "real" arguments by a '(' or or a ',' or a '.'. So, for example, if f is a function that takes two arguments, which are themselves function, and returns g, which is a function that takes two arguments, then according to right-associativity you would be able to write:

1 2 x y f

plugins: envisage http://code.enthought.com/projects/envisage/

note: when B and C don't override, the http://en.wikipedia.org/wiki/Diamond_problem diamond inheritance problem solved by keeping track of where the method originally came from, not where it immediately came from

http://lambda-the-ultimate.org/node/36373

A few parallel designs...

Peter Van Roy has written a few documents that survey different forms of concurrency. CTM and a recent chapter Programming Paradigms for Dummies [LTU Node] include such surveys.

Tim Sweeney's Next Mainstream Programming Language [LTU Node] includes a survey of a few and their 'places' in game design. He mentions transactions above, and support for wide-vector parallelization qualifies as a form of data-parallel computation.

My own interest is in 'open' distributed computing, but concurrency is part and parcel to distributed designs.

Data-Parallel computation, possible with pure logic computation and pure functional computation, allow parallelization without affecting semantics. This is generally considered "parallelization" rather than "concurrency" because only the implementation is concurrent, not the semantics.

Some systems, like Google's MapReduce?, aim to achieve data-parallel computations by shipping the computation to a much larger data-set. Wide-vector operations seen in supercomputers, graphics cards, and to a lesser degree in modern processors achieve this at a much smaller degree, but still achieve useful performance boosts.

The main challenge with data-parallel is deciding where to break up the computations - a challenge greatly exacerbated by the fact that the cost-to-benefit ratio of any given breakdown depends heavily on both the implementation architecture and on pressure from other sources of concurrency. If decided by hand, there is a high syntactic overhead, and the decision is likely too 'local' - the annotations will need be adjusted for different architectures. One may do better with some sort of adaptive design that uses profiling to decide where to break a computation down.

For distributed data-parallelization, one must also manage inevitable operations and node failures, which isn't a trivial problem. One can't fire and forget.

Functional Reactive Programming, with data-binding allows a great deal of parallelization and also has very nice features for caching, distribution and efficient multi-cast, disruption tolerance, and minimizing latencies. I personally think it to be an excellent bet for describing scene graphs. I think PVR called this 'synchronous dataflow programming' in paradigms for dummies. One can also do logic-programming with data-binding, which is neat.

In general, this is implemented such that updates invalidate caches eagerly and subscribers recompute lazily. One must avoid a common fallacy where one sees 'glitches' where, for example, x changes from 3 to 4 atomically but observers of (x*(x+2)) see 15->20->24 or 15->18->24.

Presumably, data-binding could be usefully combined with transactions to allow more than one variable to update as a single atomic action. However, I have not seen this in practice.

Blackboard Architecture and Tuple Spaces achieve concurrency by having a bunch of active agents talking to a central database. One can generally wait until a tuple with a given pattern is identified. This has a major advantage in that the elements are not temporally coupled and have excellent disruption tolerance. One can also perform ACID transactions on the central database.

The main weaknesses of this design. (1) Coordination of components to perform a shared activity becomes very complex. Different components cannot see each-other's transactions until they are completed, so multi-step behaviors between components cannot be performed atomically. Due to this limitation, individual components of the system will have inherent pressure to 'bloat up' to reinvent features rather than coordinate to achieve them. (2) Cleanup. Any given update might be observed by many components over a long period of time. It is rarely clear when or by whom a tuple should be eliminated from the repository. One can use expirations and such, but ultimately it's going to be heuristic and best guess. (3) Security and sharing (across organizations) - the use of a central database to coordinate elements is common, but not fundamentally secure. Such systems tend to have 'egg-shell' security - rather fragile, and a gooey middle that can rot quite easily after a breach.

Central point of failure is also a potential concern, but may be mitigated by varied design of the database for redundancy or distribution.

One could presumably use micro-databases to coordinate and secure them as object capabilities, but then distribution of the micro-db caps is a problem and so is multi-database pattern-matching and multi-database updates (one could fall back on FRP and distributed transactions, though).

Complex Event Processing achieves concurrency by having components produce and react to events.

Unlike message passing concurrency, there is reduced coupling between producer and consumer, usually via some sort of 'channels' or 'component BUS' or other publish subscribe model. The reduced coupling is advantageous for distribution and modularity, and also allows the system as a whole to continue even when individual nodes fail.

The main challenges of the CEP architecture occur upon startup and shutdown, as components may 'miss' important events. Usefully, one can hybridize with the 'database' approach and have the network itself keep a history to reduce startup and shutdown issues.

There is much promise in combining CEP with distributed transactions, where a sequence of events - including reactions from listeners - may be party of a common (hierarchical) transaction.

One can also optimize CEP by ensuring it is demand-driven: a source for events is made available in a registry, but isn't 'activated' until there is a corresponding sink for events. This design can, for example, automatically enable and disable a webcam based on whether there is a frame-demand. This sort of demand-driven nature can be chained through a CEP system by creating each CEP 'channel' with three facets: one to subscribe, one to update, and one suitable for data-binding that simply indicates whether someone is subscribed. (An object-per-facet is quite suitable for object capability security...)

I also propose Simple Event Processing (SEP): like CEP with a few limitations aimed to achieve greater 'implicit' parallelization by sharing computation efforts. SEP forbids creation of new events, forbids operations that reference more than one event, and forbids side-effects (no internal state, no 'sends'). An SEP network could union, filter, and transform event streams, and may be combined with functional-reactive and data-binding to filter, transform, and union streams based on mutable policies. The main advantages of this tweak: (1) it becomes possible to automatically optimize multi-cast and sharing of common 'sub-network' computations (useful when a few thousand observers might be watching an ad-hoc 'query' whose results you don't wish to name and prepare explicitly in advance). (2) Demand-driven nature can be enforced automatically end-to-end on the SEP network. CEP could then be implemented at the very 'edges' of an SEP network. To my knowledge, these optimizations have never been targeted by an existing implementation.

Publish, Subscribe models focus on distribution of events and data (of the sort used by SEP/CEP/FRP). A given model determines how one describes a subscription, and how much the model 'knows' about the data being published. Of particular interest are 'domain-centric' schemes where one can (say) describe data and event feeds in terms of geographic and geometric regions (tell me about crime within 2 miles of lat/lon). This sort of scheme would work well with massively concurrent systems with a potentially large number of multi-cast consumers, such as an MMORPG.

Publish-subscribe can be securely composed by having capability-secure but composable 'registries' for data-sources. I.e. I can create a new registry and add some sources, have it compose Google's registry, so I can see Google's data but Google cannot see mine. The trick is to achieve this coordination without introducing a 'demand' on the event and data sources, without hindering multi-cast optimizations, and potentially under some anonymity/blinding constraints (e.g. via a trusted intermediary). That takes some design effort, but is doable. By dmbarbour at Thu, 2009-10-15 19:50

Thanks - Looking also for coarser grained mechanisms

For a crude example, servlets operating under a web application server; or simple 'grep' and 'wc' style utilities tied together in a Unix pipeline. Etc.

Again, thanks much!

Scott By scottmcl at Thu, 2009-10-15 23:01

p.s. On the lower level control mechanism .....

Given the ire drawn by the threads + locks model, I'm curious what folks think this the most elegant concurrency control mechanism/framework.

Again, mucho, mucho thanks in advance.

Scott By scottmcl at Thu, 2009-10-15 23:04

Most Elegant Threads...

The threads+locks+shared-state model contradicts many principles associated with 'elegance'. For example, elegant systems should be composable, but two lock-based programs - independently correct - may deadlock when composed.

Shared state design also introduces challenges for explicit memory management, mutable collections (esp. with idioms that map an operation across the collection), visitor patterns, etc.

That said, threads and shared state can work without locks. Wait-free and lock-free algorithms and data-structures can be reasonably elegant. And a language with greater support for 'persistent' structures (such as ropes, copy-on-write maps, etc.) avoid many problems of shared state. By dmbarbour at Fri, 2009-10-16 16:56

wait-free can be hard, too

i got the impression that writing a wait or lock free system was one of those Really Hard To Get Right So 99% Of Us Shouldn't Think They Can things. By raould at Fri, 2009-10-23 21:25

totally agreed

Wait-free is really, really, really hard to get right.

That's why I said 'elegant' and not 'easy'. Like ballet. ^_^ By dmbarbour at Fri, 2009-10-23 21:28

http://en.wikipedia.org/wiki/Fortress_%28programming_language%29

http://lambda-the-ultimate.org/node/3465

note: matrices are both containers (as matrixes) and fns (as linear operators), providing a potential reason to distinguish b/t __get (subscript) and __call (fn call). however, they are really only linear ops w/r/t a basis, so mb these are separate objects? and __call = __get captures the identity of sequences w/ functions on integers

should be a shortcut for this sort of thing:

compute conditional probabilities for i in range(len(domain.attributes)): for j in range(len(domain.attributes[i].values)): for k in range(len(domain.classVar.values)): pc[i][j][k] = (pc[i][j][k] + self.m * p_class[k])/ (n_class[k] + self.m)

list of neat programming languages: http://tldp.org/HOWTO/AI-Alife-HOWTO-7.html

some ppl who think about pls:

lambatheultimate

dick gabriel (sun?) http://www.dreamsongs.com/ steele gvr sussman haskell guys, etc

read the rest of this: http://www.dreamsongs.com/10ideas.html

listen to this:

http://www.podbean.com/podcast-detail-episode/342059/episode-19-keynote----50-in-50

why you need to use a symbol table during parsing of c++:

http://compilers.iecc.com/comparch/article/98-07-199 http://stackoverflow.com/questions/1725975/no-symbol-table-in-go

why the "go" language doesn't have assertions:

" Where is assert?

Go doesn't provide assertions. They are undeniably convenient, but our experience has been that programmers use them as a crutch to avoid thinking about proper error handling and reporting. Proper error handling means that servers continue operation after non-fatal errors instead of crashing. Proper error reporting means that errors are direct and to the point, saving the programmer from interpreting a large crash trace. Precise errors are particularly important when the programmer seeing the errors is not familiar with the code.

The same arguments apply to the use of assert() in test programs. Proper error handling means letting other tests run after one has failed, so that the person debugging the failure gets a complete picture of what is wrong. It is more useful for a test to report that isPrime gives the wrong answer for 2, 3, 5, and 7 (or for 2, 4, 8, and 16) than to report that isPrime gives the wrong answer for 2 and therefore no more tests were run. The programmer who triggers the test failure may not be familiar with the code that fails. Time invested writing a good error message now pays off later when the test breaks.

In testing, if the amount of extra code required to write good errors seems repetitive and overwhelming, it might work better as a table-driven test instead. Go has excellent support for data structure literals.

We understand that this is a point of contention. There are many things in the Go language and libraries that differ from modern practices, simply because we feel it's sometimes worth trying a different approach. "

i note that if you have exceptions (go doesn't) and if you can catch an exception and tell the language to just continue on from where it left off (unlike python, but i think some languages do this), then a tester can cause assertions to be ignored just by calling "main" and catching assertion exceptions and ignoring them and telling the computer to just continue on where it left off

interesting bit from the go faq (altho i guess its not relevant to jasper b/c we dont have inheritance b/c of the 'you can only refer to an interface' philosophy):

Why is there no type inheritance?

Object-oriented programming, at least in the best-known languages, involves too much discussion of the relationships between types, relationships that often could be derived automatically. Go takes a different approach.

Rather than requiring the programmer to declare ahead of time that two types are related, in Go a type automatically satisfies any interface that specifies a subset of its methods. Besides reducing the bookkeeping, this approach has real advantages. Types can satisfy many interfaces at once, without the complexities of traditional multiple inheritance. Interfaces can be very lightweight having one or even zero methods in an interface can express useful concepts. Interfaces can be added after the fact if a new idea comes along or for testing without annotating the original types. Because there are no explicit relationships between types and interfaces, there is no type hierarchy to manage or discuss.

It's possible to use these ideas to construct something analogous to type-safe Unix pipes. For instance, see how fmt.Fprintf enables formatted printing to any output, not just a file, or how the bufio package can be completely separate from file I/O, or how the crypto packages stitch together block and stream ciphers. All these ideas stem from a single interface (io.Writer) representing a single method (Write). And that's only scratching the surface.

think about how to express concurrency constructs more elegantly using graphs

how to make monitors, timber construcs in an extensible way

tables, associative arrays, maps, dicts, hashtables. i guess "map" is a good name. in jasper, the map data structure is also actually a function, so this is even better.

http://lambda-the-ultimate.org/node/1394

Parsing (not regexes)
Undo
Tracking program state
Constraint solving
Concurrency
Database operations
XML manipulation

....

Other languages are designed exactly for these things that Proebsting mentions: XML manipulation is prominent in CDuce, XDuce, and Scala, and in the LL2 video Proebsting proposes Erlang as a solution to the concurrency problem.

as noted on http://en.wikipedia.org/wiki/Parameter_covariance , there is an issue if, for example, String is a subtype of Object, and arrays of strings are subtypes of arrays of objects, since in fact you can't substitute a String array for any Object array; the String array won't be able to accomodate an insertion of an Int, for example.

in Jasper, there would be an Object array implementation (parametric polymorphism/generic programming, using a type variable, i.e. "?x Array", and a String array implementation. If a list of only Strings was required, the String array implementation would be selected, as it is more specific.

http://en.wikipedia.org/wiki/Parameter_covariance#Java has an interesting feature with the wildcards in parametric polymorphism that can have upper or lower bounds (covariant or contravariant)

some interesting ones from MUMPS http://en.wikipedia.org/wiki/MUMPS:

Arrays: are created dynamically, stored as B-trees, use almost no space for missing nodes, can use any number of subscripts, and subscripts can be strings or numeric (including floating point). Arrays are always automatically stored in sorted order, so there is never any occasion to sort, pack, reorder, or otherwise reorganize the database. $ORDER, $ZPREVIOUS, and $QUERY functions provide efficient traversal of the fundamental array structure, on disk or in memory.

for i=10000:1:12345 set sqtable(i)=i*i set address("Smith","Daniel")="dpbsmith@world.std.com"

Local arrays: variable names not beginning with caret are stored in memory by process, are private to the creating process, expire when the creating process terminates. The available storage depends on partition size, but is typically small (32K).

Global arrays: ^abc, ^def. These are stored on disk, are available to all processes, and are persistent when the creating process terminates. Very large globals (eg, hundreds of megabytes) are practical and efficient in most implementations. This is MUMPS' main "database" mechanism. It is used instead of calling on the operating system to create, write, and read files.

Indirection: in many contexts, @VBL can be used, and effectively substitutes the contents of VBL into another MUMPS statement. SET XYZ="ABC" SET @XYZ=123 sets the variable ABC to 123. SET SUBROU="REPORT" DO @SUBROU performs the subroutine named REPORT. This is effectively the operational equivalent of "pointers" in other languages.

Piece function: This breaks variables into pieces guided by a user specified separator character. Those who know awk will find this familiar. $PIECE(STRINGVAR,"^",3) means the "third caret-separated piece of STRINGVAR." It can appear as an assignment target. After

SET X="dpbsmith@world.std.com"

$PIECE("world.std.com",".",2) yields "std" SET $P(X,"@",1)="office" causes X to become "office@world.std.com" (note that $P is equivalent to $PIECE and could be written as such).

Order function

Set stuff(6)="xyz",stuff(10)=26,stuff(15)=""

$Order(stuff("")) yields 6, $Order(stuff(6)) yields 10, $Order(stuff(8)) yields 10, $Order(stuff(10)) yields 15, $Order(stuff(15)) yields "".

Set i="" For Set i=$O(stuff(i)) Quit:i= Write !,i,10,stuff(i)

Here, the argument-less For repeats until stopped by a terminating Quit. This line prints a table of i and stuff(i) where i is successively 6, 10, and 15.

Multi-User/Multi-Tasking/Multi-Processor: MUMPS supports multiple simultaneous users and processes even when the underlying operating system does not (Eg. MS-DOS). Additionally, by specifying a machine name in a variable (as in SET ^

"DENVER"

A(1000)="Foo"), you can access data on remote machines.

from icon http://en.wikipedia.org/wiki/Icon_programming_language

  every write(someFunction(i to j))

  someFunction.i:j write map

notes on PlusCal? http://research.microsoft.com/en-us/um/people/lamport/pubs/pluscal.pdf

from Lamport's website: "

The PlusCal? Algorithm Language Theoretical Aspects of Computing-ICTAC 2009, Martin Leucker and Carroll Morgan editors. Lecture Notes in Computer Science, number 5684, 36-60. PDF PlusCal? (formerly called +CAL) is an algorithm language. It is meant to replace pseudo-code for writing high-level descriptions of algorithms. An algorithm written in PlusCal? is translated into a TLA+ specification that can be checked with the TLC model checker [126]. This paper describes the language and the rationale for its design. A language manual and further information are available here.

An earlier version was rejected from POPL 2007. Based on the reviews I received and comments from Simon Peyton-Jones, I revised the paper and submitted it to TOPLAS, but it was again rejected. It may be possible to write a paper about PlusCal? that would be considered publishable by the programming-language community. However, such a paper is not the one I want to write. For example, two of the three TOPLAS reviewers wanted the paper to contain a formal semantics--something that I would expect people interested in using PlusCal? to find quite boring. (A formal TLA+ specification of the semantics is available on the Web.) I therefore decided to publish it as an invited paper in the ICTAC conference proceedings. "

Most of the PlusCal? examples are not suitable as a programs because they define things in terms of brute-force enumeration through all possibilities (this is fine for PlusCal?'s purpose, but here i'm looking at it for ideas for programming language constructs) from which items meeting certain criteria are chosen (i.e. basically, a common idiom in PlusCal? is to pose an N.P. complete problem whenever you want something done, that is, to give a set of easily-checked conditions that you want satisfied and a set of potential solutions to enumerate over). However, it's still worth combing through for constructs:

function ("algorithm") defn -- already in jasper

predicate defn -- already in jasper (via the constraint satisfaction stuff)

the above mentioned brute force syntax: e.g. Divides(i,j) = \exists k \in 0..j : j = i * k. this style requires lists, forall, exists, and "with" nondeterminism, see below.

\forall and \exists, which take predicates over elements and return predicates over lists, written e.g. \exists k \in LIST : CONDITION. these should be lib fns in jasper.

multiple assignment: u := v

v := u is like t := u; u := v; v := t. the key is that all the things on the right are evaluated before any of the things on the left

  in Jasper, this will be a special case of graph assignment.

"with" nondeterminism: using the "with" keyword, a value is selected via exhaustive search that satisfies a condition. in jasper, this is given by the constraint satisfaction programming stuff, which needs to be augmented by a possibility list.

conversion of the set of values satisfying constraints into an explicit list, e.g. B \in {C \in Perms(A) : condition1 AND condition2}. In this case, b/c we are using "B \in {}", this list is just used as another constraint, but the {} operator itself seems like it could be used to create an explicit list.

goto, and labels

"await", which blocks until some condition is true e.g. await y=0

arrays

atomic operations e.g. "<y := 0>" in the pseudo-code or "label1: y := 0" when labels are used to indicate atomicity

n-way fork

boolean ops, std comparison ops, set union

can quantify (i.e. exhaustively search) over finite arrays = quantify over functions with finite domains (e.g. Auto(S) = {f \in

S -> S : ...})

set subtraction e.g. Nat \ {0} for positive integers

map e.g. {i \in Nat : i > 0}

filter e.g. {i+1 : i \in Nat}

"domain" function, which fetches the finite domain of some fn over a finite domain (i.e. some array)

while loops

"either S1 or S2 or ...." for nondeterministic branching

http://en.wikipedia.org/wiki/Guarded_Command_Language GCL's do: Repetition: do

The repetition executes the guarded commands repeatedly until none of the guards are true. Usually there is only one guard. [edit] Syntax

do G0 \rightarrow S0 [] G1 \rightarrow S1 ... [] Gn \rightarrow Sn od

example: euclid algorithm for GCD:

a, b := A, B; do a < b \rightarrow b := b - a [] b < a \rightarrow a := a - b od

GCL's cond (called if) is nondeterministic in that if more than one condition is true, more than one of the associated statements may be executed. my note: a variant in which all the associated statements of all conditionals seems are executed seems more appropriate.

note the distinction between the use of cond in imperative languages and the use of case in expressions; if none of the conditions are true in the imperative setting, an obvious default is to do nothing; whereas in the expression (functional) setting there is no obvious default. however, one may also view the imperative commands as returning "success" or "fail" in addition to their side-effects; in this case, it is unclear whether an imperative cond construct should return success or fail if none of the conditions are met (my pick would be to have it return fail, but then to have a lightweight syntax for disregarding failure)

exceptions may be status reports, warnings (resumable and resumed by default) or errors (not resumable). lightweight syntax for ignoring errors on unimportant statements. that is, exceptions and logging can be combined. mb need facility for logger to ask to get all messages from subroutines, even if the caller deems them unimportant and intercepts them. perhaps intercepted messages transform into "ghost" messages, which still percolate up and cannot be blocked (nice analog with christian theology btw; after their death, the ghosts of messages cannot have any material effect upon execution, e.g. by causing an error, but must still ascend up the call chain to heaven, i.e. the First Cause, so that when they reach heaven a record of their content may be written in the log, perhaps in order to assess the performance of the program which generated the message.

so, note that both cond and do seem to admit any/all modal variants; on each iteration of the do, are all applicable branches called, or is just one nondeterministically picked?

according to http://en.wikipedia.org/wiki/Guarded_Command_Language , GCL is good for expressing quasi-delay-insensitive circuits, which sound like a good neural async computation model for the brain. QDI circuits are delay insensitive except for the presence of isochronic forks. asymmetric isochronic forks mean that one path is guaranteed to be shorter than the other. apparently pure delay-insensitive circuits are not Turing complete, and QDI are, so QDI is a candidate for one sort of weakest constraint.

string scanning from Icon:

 s := "this is a string"
 s ? {                               # Establish string scanning environment 
     while not pos(0) do  {          # Test for end of string 
         tab(many(' '))              # Skip past any blanks
         word := tab(upto(' ') | 0)  # the next word is up to the next blank -or- the end of the line
         write(word)                 # write the word
     }
  }

F# LINQ and quoting: http://tomasp.net/articles/dynamic-flinq.aspx

"Perl's ability to "glue together" other programs, or transform the output of one program so it can be used as input to another. "

"A good scripting language is a high-level software development language that allows for quick and easy development of trivial tools while having the process flow and data organization necessary to also develop complex applications. It must be fast while executing. It must be efficient when calling system resources such as file operations, interprocess communications, and process control. A great scripting language runs on every popular operating system, is tuned for information processing (free form text) and yet is excellent at data processing (numbers and raw, binary data). It is embeddable, and extensible. Perl fits all of these criteria. "

http://www.softpanorama.org/People/Wall/index.shtml

kant's category of community is putting a number of parts together into one concept corresponding to the judgement which is the exclusive OR of the parts. this corresponds to an enum in a programming language.

---

model-based evaluation of clusterting validation measures

In the following, "=" means "proportional to"

Let V1_1 refer to the population of neurons in V1 receiving input from LGN, and V2_1 to the population in V2 receiving input from V1. Let

SOME_POP mean the number of cells in population SOME_POP. E.g. #V1_1 means "# of cells in V1_1". Let SOME_AREA_IN be the population of neurons serving as input to SOME_AREA. Let SOME_AREA_TO_SOME_OTHER_AREA be the population of neurons projecting from SOME_AREA to SOME_OTHER_AREA.

Assume that: (1) #V2 = #V1 (2) #V1 = #V1_in^(3/2) (3) V2_1 is doing a 3/2 wavelet transform of whatever its input from V1 is (4) V1_1 dominates the count of cells in V1 (i.e. it scales up as fast or faster than any other subpopulation in V1, to the extent that if you count #V1 you are basically counting #V1_1), and likewise V2_1 dominates #V2 (5) Every cell has an output and every cell's output gets used (6) V2 gets all its input from V1 (100) V1 is not doing any other computation after its initial wavelet transform and before it sends stuff to other areas (i.e. V1 basically only contains V1_1)

Because of (3), we have

(7) #V2_1 = #V2_1_in^(3/2)

Because of (6) and the definition of V2_1, we have

(8) V2_1_in = V1_TO_V2

Because of (4) and (1),

(9) #V1 = #V1_1 = #V2 = #V2_2

therefore, by (7), (8), and (9),

V2_1 = #V1_to_V2^(3/2) V1_to_V2 = #V1_1^(2/3)

By (100), each neuron in V1_to_V2 corresponds to exactly one neuron in V1_1.

Since #V1_to_V2 = #V1_1^(2/3), we know that #V1_to_V2 < #V1_1, and therefore so there are a number of neurons in V1_1 which aren't sending their outputs to V2_1 (call them "orphans"). Since V2_1 is defined as that part of V2 receiving input from V1, there are a number of neurons in V1_1 which aren't sending their outputs to any part of V2. The number of these is proportional to #V1_1^(1/3). By (5), these outputs must be going somewhere, so they must be going to some other area besides V2.

Let's say the orphans are going to Vx_1, and that they are the only input to Vx_1. We would have #Vx_1 = #orphans^(3/2) = (#V1_1^(1/3))^(3/2) = #V1_1^(1/2). So, the scaling relation between

V1_1 and #Vx_1 would not be the same as between #V1_1 and #V2_1. Assuming that #Vx_1 dominates the count of some area VX, we have that there is some cortical area VX that doesn't scale proportionately with V1.

So, if we keep assumpions (1)-(6), then either we have various cortical areas which don't scale proportionately to each other, or (100) must be wrong. (100) could be wrong if we let each cortical area contain an addition "reduction" computation which is done after the initial wavelet transform and which combines the results of many neurons into fewer. So, there could be a V1_2 population of neurons with #V1_2 = #V1_1^(2/3), such that V1_2 projects to V2, not V1_1.

Is the embeddable aspect of Falcon versatile?

I talked diffusely about that in the Regex example above, but other than the reconfigurability and sharing of pre-loaded modules across application vm, we have more. The vm itself has many virtual methods that can be overloaded by the target application, and is light enough to allow a one-vm-per-script model. Heavy scripts can have their own vm in the target application, and can happily be run each in its own thread; yet vms can be recycled by de-linking just run scripts and linking new ones, keeping the existing modules so that they're already served to scripts needing them.

The vm itself can interact with the embedding application through periodic callbacks and sleep requests. For example, a flag can be set so that every sleep request in which the vm cannot swap in a coroutine ready to run is passed to the calling application, that can decide do use the idle time as it thinks best. For instance, this allows spectacular quasi-parallel effects in the FXChat binding, where the sleep() function allows Xchat to proceed. This may seem a secondary aspect, but other engines are actually very closed on this; once you start a script or issue a callback, all that the application can do is to hope that it ends soon. With Falcon you can interrupt the target vm with simple requests that will be honoured as soon as possible, and eventually resume it from the point it was suspended and inspected.

Since 0.9 we have introduced even a personalized object model. Falcon instances need not be full blown Falcon objects; the application may provide its own mapping from data to items travelling through the Falcon vm. Compare this with the need of creating a dictionary at each new instance, and having to map each property to a function retrieving data from the host program or from the binded library.

Other classes which you can override are the module loader, which may provide Falcon modules from other type of sources, or from internal storage in embedding applications, and since 0.9 the URI providers. Modules and embedding applications can register their own URI providers, so that opening a module in the app: space would turn into a request to get a module from an internally provided resource, or opening a stream from a script from app: would make possible to communicate binary data via streams to other parts of the application.

Frankly, we did our best to make our engine the most versatile around. They say LUA is very versatile, as you can reprogram it as you wish. But then, that is true for any open source project.

Page break by AutoPager?. Page( 5 ). Goto Window Top Page Up Page Down Goto Window Bottom

How will the new multithreading design in version 0.9 innovate the scripting language panorama?

There are two good reasons why multithreading in scripting languages are delicate matters (that many didn't even want to face). The first is that multithreading can break things. In good multithreading (multithreading which is allowed to actually exploit parallel computational power of modern architectures without excessive overhead), there is no way to recover from an error in a thread. A failing thread is a failing application, and that is a bit problematic to be framed in the controlled execution concept behind scripting language virtual machines.

The second reason is, as LUA developers point out, that a language where a = 0 is not deterministic cannot be proficiently used in multithreading. Some scripting language make a = 0 be deterministic and visible across threads by locking every assignment instruction, and that is a performance killer under many aspects. It doesn't only deplete performance on the script itself, but in case of concurrent programming in an application, it may severely deplete the host application performance by forcing it to unneeded context switches.

We opted for a pure agent based threading model. Each thread runs a separate virtual machine, and communication across threads can happen only through specialised data structures. In this way, each virtual machine can run totally unhindered by global synchronisation. It is possible to share raw memory areas via the MemBuf? item type, or to send complete objects created upon separate elaboration via a interthread item queue.

The point is that, in our opinion, multithreading in scripting languages cannot be seen as multithreading in low level languages, where each operation can be mapped to activities in the underlying parallel CPUs. The idea of mutex/event -based parallel programming is to be rejected in super-high level languages as scripting languages, as there are too many basic operations involved in the simplest instruction. Since, in complex applications written even with low level languages, those primitives are used by low to create higher level communication mechanisms, our view is that multithreading in scripting languages should provide exactly those mechanisms, without trying to force the scripts to do what they cannot proficiently do, that is, low level synchronization primitives.

When I write a server, I find myself struggling to create complex synchronisation rules and structures through those primitives, avoiding to use them directly, and I don't see why we should bestow the same struggle on script users. The realm where primitive synchronisation is useful is not a realm where scripting languages should play a direct role it's where you would want to write a C module to be used from the scripting language anyhow.

In 0.9 we have introduced an inter-thread garbage collector that accounts for objects present in more virtual machines. This is already exploited via the sharing of MemBuf? instances, but we plan to extend this support to other kind of objects. For example, it is currently possible to send a copy of a local object to another thread via an item queue (the underlying data, possibly coming from a module or from an embedding application, can actually be shared; it's just the representation each vm has of the object that must be copied). This makes it a bit difficult to cooperate on creating complete objects across threads, and even if this works in term of agent-based threading, we're planning to use the new interthread GC system to be able to send deep items across threads. Since 0.9, it is already possible to create deep data externally (i.e. in the embedding application or in a module) and send it to a vm in a different thread.

The only problem left in doing it natively across two different vms is ensuring that the source vm won't be allowed to work on the object and on any of the data inside it while the target vm is working on it. Even if this may seem a limitation, it's exactly what the "object monitor" approach to multithreading dictates, and it is perfectly coherent with our view of higher level parallel abstraction. 0.9 version also introduces the mechanism of interthread broadcasts, with message oriented programming extended to interthread operations. We still have to work that out, completely, but that's the reason why we're numbering this release range 0.9 .

Finally, as the vm have native multithread constructs now, we may also drop the necessity to have different vms for different threads, as each thread may just operate on its own local context, while common operations on the vm (as loading new modules) can be easily protected. Still, we need to consider the possibility of multiple vms living in different threads, as this is a useful model for embedding applications.

how to safely pass references:

" http://bartoszmilewski.wordpress.com/2009/03/23/types-for-concurrency/

Can a good type system prevent concurrency errors? Or is this a quest for the Holy Grail?

There are two parts to this question, corresponding to two major types of concurrency errors:

   1. Preventing data races
   2. Preventing deadlocks

I ll start with the first one.

Data races occur only when memory is shared between threads. Disallow sharing and data races are gone! In fact there is a name for threads that don t share memory: processes. It s perfectly feasible to have a concurrent language that disallows sharing Erlang is one (see my post about Erlang). The trick is to always pass data between threads by value. This is especially easy in functional languages.

Non-functional languages like C++, Java, or D (I was told by Walter Bright, the creator of D, to always use the full name, the D programming language, so that search engines can index it properly) tend to share data almost by default (see, however, this post).

In Java, all non-primitive types have reference semantics. When you pass a Java object between threads, you re only passing a handle; the object behind it becomes accessible from both threads.

C++ at least has means to express passing by value and move semantics for user-defined types. Still, it s up to the programmer to use them. Who ordered Guava?

For every type-system idea there is one or more dialects of Java that implement it. I ll start with an older attempt at data-race free Java called Guava, as it illustrates some of the basic premises. -Explicit sharing

The most important step if we don t want to completely ban the sharing of data is to regulate it. Let the programmer explicitly mark the data destined for sharing as such. The corollary is that the data that is not marked for sharing cannot be shared. This can be accomplished, for instance, by making all objects thread-local by default, or by using type modifiers that prevent references to such objects from escaping.

In Guava, the data type designed for sharing is called a Monitor. As the name suggests, all access to a Monitor is automatically synchronized by the Monitor s lock. This, incidentally, eliminates the need for the synchronized keyword, which is absent from Guava.

The non-shared data types are either Objects or Values.

Objects are just like regular Java Objects, except that they don t have a built-in lock, since they can never be shared between threads.

Values are either primitive values, like integers, or user-defined types with value semantics.

Monitors, Objects, and Values are collectively called instances. -Value semantics

When you pass a Value, the compiler will make a deep copy of it (well, sort of the monitors embedded in a Value are not deep copied). Since deep copying might be expensive, Guava defines operator move , which nulls the source. The syntax is:

  v2 = v1--;

The value v1 becomes null after the move to v2. This is similar to C++ unique_ptr and std::move. -Ownership

The biggest problem in lock based concurrency is to make sure that the correct lock(s) are taken when accessing shared data. In Guava, all Monitor s data are protected by that Monitor s lock. As long as they stay inside that Monitor, nothing bad can happen to them from the point of concurrency.

Values stored inside a Monitor are never accessible outside of the Monitor only their copies may be passed out.

The same is not true about Objects. Since Objects have reference semantics, there is a real danger that Objects references might escape the Monitor that protects them. Imagine a situation where two Monitors have references to the same Object. It is possible then that two threads may operate on that Object at the same time one entering through one Monitor, the other through the other Monitor. We have a data race!

Therefore it is important that every Object have an owner at all times. The Object s owner is either a Value or a Monitor. (The exception is a fresh Object that s been just allocated it has no owner until it is assigned to a variable.) Since an Object may only be owned by at most one Monitor, it is that Monitor that protects it from simultaneous (racy) access. -Regions

All Objects that are owned by a particular Monitor or Value form a region. Equivalently, assigning a monitored region to an object specifies what lock must be held when accessing it.

All instances may contain (references to) monitors, but monitors are not owned by anybody. References to the same monitor may appear in multiple regions and may be freely passed around. It is thus up to programmers to define an ordering scheme for their monitors in order to avoid deadlocks.

How can we protect Objects from moving between regions and acquiring multiple owners? We need a way to control aliasing.

Here are some Guava rules for passing Objects. A method may declare its Object parameter as either kept or lent. (By default, parameters to Object methods are kept and to Monitor methods are lent.) If the parameter is kept it must belong to the same region as this, and there are no limitations on its use. If, however, the parameter is lent, the method may not store a reference to it in this, nor may it store this inside a lent Object. No cross-aliasing is allowed.

A method may also be marked new if it returns a freshly created object, which has no owner yet. Constructors are considered new unless they accept kept parameters.

Notice that you may have multiple references to the same Object, but they will all be within the same region. The only instances that may be passed between threads are Monitors and Values. -Constructors

Guava final fields may either be initialized inside a constructor or in a private method that is only called from inside a constructor. (BTW, in D a private method is not private within a module, so the compiler would have to analyze the whole module to establish the latter condition.) [Note to self: The same scheme might be useful in the construction of immutable objects in D.]

Partially constructed Objects must not be visible outside constructors. The compiler must verify that constructors don t pass this to other methods, and don t store this inside other instances (no alias cross-contamination). -Optimizations

Copying Values around may be expensive. I already mentioned one optimization, the use of the move operator. The other optimization is related to immutability. If a Value is immutable, it may be safely passed by reference. Guava defines immutable classes as ones that have no update methods. Any method that may modify this must be marked update. The update notation is also extended to method parameters by default parameters are immutable.

There is a bonus advantage to separating update methods from the rest. In a Monitor, a non-update method may safely use a readers lock, which allows multiple readers to access data simultaneously, to increase concurrency. -Global and local methods

A method is considered local if its effects cannot be observed or influenced by other threads. All Object and Value methods are by default local. A local method is immune to races thus allowing single-thread optimizations.

Conversely, all Monitor methods are considered global, since operations on Monitors may be visible or influenced by other threads.

These defaults may be overridden. For instance, an Object may contain a reference to a Monitor. The methods that call this Monitor s methods must be marked global. Moreover, Object or Value methods that access Monitors that are passed to them as arguments must also be marked global. So touching a Monitor (which is equivalent to using its lock) taints the whole callers tree with the global annotation.

This is similar to the way update taints the callers tree, except the update annotation of a method only pertains to this, not to its arguments. However, when global and update are combined, they have the same tainting power as global. In particular, a method that invokes a global update method on its argument becomes tainted with global update.

Methods that are global update cannot be invoked from non-update methods, even if they only global-update their arguments.

Note: This part of the paper is not very clear to me. The authors never explain the importance of global update methods (other than optimization opportunities). -Limitations

Guava implements a relatively simple and somewhat limited system to eliminate races. It punts the problem of ownership-passing and lock-free programming. Even with those limitations the Guava type system is not simple.

The idea that safe multithreading may be achieved with a few simple modification to the type system seems to be untenable. However, as long as special type annotations are limited to the code that does actual multithreading, I think they re worth the added complexity.

more at here's more on his idea: http://bartoszmilewski.wordpress.com/2009/05/26/race-free-multithreading/ :

Teaser #1

In one of my previous posts I described a concurrent data structure based on Haskell s MVar. It s essentially a one-element message queue. Let s have a peek at the implementation that uses my proposed multithreaded scheme. Notice the total absence of special annotations even the synchronized keyword is missing. The only unusual thing is the move operator, :=, introduced in my previous blog. It is needed in case we want to pass unique objects through MVar. You are probably asking yourself, How the heck is this supposed to work in a multithreaded environment? Read on!

class MVar<T> { private: T _msg; bool _full; public: put: asynchronous (non-blocking) Precondition: MVar must be empty void put(T msg) { assert (!_full); _msg := msg; move _full = true; notify(); } take: If empty, blocks until full. Removes the message and switches state to empty T take() { while (!_full) wait(); _full = false; return := _msg; } }

Let s instantiate this template as a monitor that is an object accessible from multiple threads. We do it by specifying the owner as self (the this owner is not explicitly listed as a template parameter, but it can be passed to the template during instantiation).

auto mVar = new MVar<owner::self, int>;

In a self-owned object all methods are automatically synchronized. The move operator acting on an int simply copies it.

That was easy! How about instantiating a self-owned MVar for passing unique objects? Piece of cake!

auto q = new MVar<owner::self, unique Foo>;

auto foo = new unique Foo; q.put(:= foo); move foo assert (foo is null);

another thread unique Foo f2 = q.get(); implicit move of rvalue

So far data races have been avoided by implicit synchronization in self-owned objects and by the use of move semantics for unique objects.

Of course you know how to break the MVar, right? Instantiate it with a regular, non-unique class objects and watch aliases spread through threads. Let s try it:

auto mVar = new MVar<owner::self, Foo>; auto myFoo = new Foo; mVar.put(myFoo); now let me race through my local alias! myFoo.method();

Surprise! This code will not compile because a self-owned object cannot have a thread-local member. Because of a richer type system, the compiler can easily detect such violations. I ll explain the details of the ownership scheme in my future posts for now let me assure you that, no matter how hard you try, you can t create a data race in this system (unless you bypass it with explicit casts). How to making concurrent programming safer?

I propose to do this by extending the type system. (You ve already seen some examples above.) I realize that there is strong resistance to extending a language s type system. The main problem is increased complexity. The programmers are reluctant to study and use complex annotation schemes. I will argue that, in the case of multithreading, the complexity argument is reversed. Shared-memory multithreading is hard! Figuring out how to avoid all the pitfalls of sharing is harder than learning a type system that prevents them.

If a programmer is not willing to learn or cannot understand the race-free type system, he or she should not be writing multithreaded programs. How to limit complexity?

To limit the complexity of the type system, I came up with several simple principles. The most important one is that multithreaded extensions should not require modifications to single-threaded programs. That means coming up with reasonable defaults for all new annotations.

If necessary, the compiler should be able to derive some of the defaults from program analysis. Because modularity is very important, such analysis should not creep across compilation unit boundaries. Whole-program analysis is out of the question. How to keep goals reasonable?

There are two major challenges in multithreaded programming:

   1. Avoiding races
   2. Preventing deadlocks

The first problem is approachable, whereas the second seems to be a pie in the sky. I will therefore concentrate on designing a race-free system that is based on existing academic research.

Such system would be incomplete without some support for lock-free programming. There are very good reasons for the language to enforce sequential consistency. For comparison, C++ committee, after a long struggle, decided to allow non-SC primitives (weak atomics). Java only enforces SC for volatile variables. Teaser #2

Speaking of lock-free programming, here s the implementation of the infamous Double-Checked Locking Pattern:

class Singleton<T> { private: lockfree T _value; public: T get() lockfree { if (_value is null) { synchronized(this) { if (_value is null) _value = new T; } return _value; } } }

A lockfree variable is guaranteed to be always atomically accessed in a sequentially consistent way. A lockfree method is not synchronized, but it can only operate on lockfree variables (or use a synchronized section). Even though the use of lockfree doesn t introduce low-level races, it may create high-level races when accessing more that one lockfree variable at a time. But we all know that lock free programming is only for desperadoes. Lessons learned from previous work

There are many commonalities in various approaches to race freedom.

Ownership should be built into the type system. In OO languages each object must have an owner. In C-like languages each piece of data must have an associated lock.
There should be efficient mechanisms for passing data between threads o By value o By reference. (The object being passed must be a monitor.) o As immutable data, by const reference o By unique reference using move semantics

Most of those ideas are expressible through type qualifiers. Higher-level models

Explicit synchronization is hard, no doubt about it. In my proposal, the type system makes it safe. There are however other models of concurrency that, although more restrictive, are often easier to use.

One such model is based on message passing. In my opinion, support for message passing should be library-based. The race-free type system will make it safe and flexible. It will, for instance, support passing messages not only by value but also by reference without sacrificing safety. In traditional type systems there is no way to express the requirement that a message passed by reference must either be a monitor itself (have it s own lock) or behave like a C++ unique_ptr (an object that leaves no aliases behind). This requirement should be expressible through the type system, allowing the compiler to check for its violations.

I ve been paying a lot of attention to software transactional memory (STM). I believe that STM could be built into the language again, with the support of the type system. It looks like the hardest problem with STM is how it should inter-operate with other types of multithreaded access (both locked and lock-free). The simplest solution is to enforce full isolation no STM object can ever be accessed outside of a transaction. It s not clear how practical such approach is, but it definitely simplifies things to the point that STM becomes orthogonal to other types of synchronization. And that means it will be possible to tackle it at a later time, after the race-free multithreading model is firmly established.

In the next installment, I will describe the ownership scheme that is the foundation of the race-free type system.

Please vote for this article on reddit. Bibliography

C. Flanagan, M. Abadi, Object Types against Races. The seminal paper introducing ownership-based type systems.

See also my previous posts on Guava and GRFJ that discuss race free dialects of Java. "

java "synchronized" methods: monitors java "volatile" variables: accesses to this variable are automatically controlled by a lock. since you can only hold the lock during one read or write, you can't get a deadlock while waiting for it

http://www.javamex.com/tutorials/synchronization_concurrency_1.shtml

idea: make everything in one line of code atomic (acquire all locks atomically at the beginning of the line, and then use them all, and then release them all at the end). avoids this problem: http://stackoverflow.com/questions/461896/what-is-the-most-frequent-concurrency-problem-youve-encountered-in-java/466334#466334 "It can be easy to think synchronized collections grant you more protection than they actually do, and forget to hold the lock between calls. If have seen this mistake a few times:

 List<String> l = Collections.synchronizedList(new ArrayList<String>());
 String[] s = l.toArray(new String[l.size()]);

For example, in the second line above, the toArray and size() methods are both thread safe in their own right, but the size() is evaluated separately from the toArray(), and the lock on the List is not held between these two calls. If you run this code with another thread concurrently removing items from the list, sooner or later you will end up with a new String[] returned which is larger than required to hold all the elements in the list, and has null values in the tail. It is easy to think that because the two method calls to the List occur in a single line of code this is somehow an atomic operation, but it is not. "

in the Java Memory Model (JMM), variables which are not volatile and not accessed in synchronized methods apparently aren't required to be updated except thread-locally, i.e. http://stackoverflow.com/questions/461896/what-is-the-most-frequent-concurrency-problem-youve-encountered-in-java/477729#477729 " vote up 6 vote down

Until I took a class with Brian Goetz I didn't realize that the non-synchronized getter of a private field mutated through a synchronized setter is never guaranteed to return the updated value. Only when a variable is protected by synchronized block on both reads AND writes will you get the guarantee of the latest value of the variable.

public class SomeClass?{ private Integer thing = 1;

    public synchronized void setThing(Integer thing)
        this.thing = thing;
    }

    /**
     * This may return 1 forever and ever no matter what is set
     * because the read is not synched
     */
    public Integer getThing(){
        return thing;  
    }}

why aren't variables "volatile" by default in java? inefficiency, i guess?

http://stackoverflow.com/questions/461896/what-is-the-most-frequent-concurrency-problem-youve-encountered-in-java

copied here b/c its so useful:

What is the most frequent concurrency problem you ve encountered in Java? vote up 32 vote down star 31

This is a poll of sorts about common concurrency problems in Java. An example might be the classic deadlock or race condition or perhaps EDT threading bugs in Swing. I'm interested both in a breadth of possible issues but also in what issues are most common. So, please leave one specific answer of a Java concurrency bug per comment and vote up if you see one you've encountered. Thanks! concurrency multithreading java bugs polls flag

edited Jan 20 at 16:39 Alex Miller

community wiki

2 revisions

47 Answers oldest newest votes 1 2 next vote up 29 vote down

The most common concurrency problem I've seen, is not realizing that a field written by one thread is not guaranteed to be seen by a different thread. A common application of this:

class MyThread? extends Thread { private boolean stop = false;

  public void run() {
    while(!stop) {
      doSomeWork();
    }
  }

  public void setStop() {
    this.stop = true;
  }}

As long as stop is not volatile or setStop is not synchronized this is not guaranteed to work. This mistake is especially devilish as in 99.999% it won't matter in practice as the reader thread will eventually see the change - but we don't know how soon he saw it. link

flag

answered Jan 20 at 19:09 Kutzi

community wiki

2 A great solution to this is to make the stop instance variable an AtomicBoolean?. It solves all the problems of the non-volatile, while shielding you from the JMM issues. Kirk Wylie Jan 20 at 23:59 4 It's worse than 'for several minutes' -- you might NEVER see it. Under the Memory Model, the JVM is allowed to optimize while(!stop) into while(true) and then you're hosed. This may only happen on some VMs, only in server mode, only when the JVM recompiles after x iterations of the loop, etc. Ouch! Cowan Feb 11 at 6:15 show 6 more comments vote up 24 vote down

My #1 most painful concurrency problem ever occurred when two different open source libraries did something like this:

private static final String LOCK = "LOCK"; use matching strings in two different libraries

public doSomestuff() { synchronized(LOCK) { this.work(); } }

At first glance, this looks like a pretty trivial synchronization example. However; because Strings are interned in Java, the literal string "LOCK" turns out to be the same instance of java.lang.String (even though they are declared completely disparately from each other.) The result is obviously bad. link

flag

answered Jan 20 at 22:44 Jared

community wiki

4 This is one of the reasons why I prefer private static final Object LOCK = new Object(); Andrzej Doyle Jan 21 at 9:59 2 I love it - oh, this is nasty :) Thorbjørn Ravn Andersen Jan 21 at 16:40 2 That's a good one for Java Puzzlers 2. Dov Wasserman Jan 29 at 18:30 1 Actually...it really makes me want the compiler to refuse to allow you to synchronize on a String. Given String interning, there is no case where that would be a "good thing(tm)". Jared Feb 4 at 15:30 1 @Jared: "until the string is interned" makes no sense. Strings don't magically "become" interned. String.intern() returns a different object, unless you already have the canonical instance of the specified String. Also, all literal strings and string-valued constant expressions are interned. Always. See the docs for String.intern() and §3.10.5 of the JLS. Laurence Gonsalves Aug 6 at 4:21 show 5 more comments vote up 23 vote down

A common problem is using classes like Calendar and SimpleDateFormat? from multiple threads (often by caching them in a static variable) without synchronization. These classes are not thread-safe so multi-threaded access will ultimately cause strange problems with inconsistent state. link

flag

answered Jan 20 at 16:17 Alex Miller

community wiki

vote up 19 vote down

One classic problem is changing the object you're synchronizing on while synchronizing on it:

synchronized(foo) { foo = ... }

Other concurrent threads are then synchronizing on a different object and this block does not provide the mutual exclusion you expect. link

flag

answered Jan 20 at 16:02 Alex Miller

community wiki

1 Ha...now that's a tortured description. "unlikely to have useful semantics" could better be described as "most likely broken". :) Alex Miller Jan 20 at 16:45 show 5 more comments vote up 15 vote down

Though probably not exactly what you are asking for, the most frequent concurrency-related problem I've encountered (probably because it comes up in normal single-threaded code) is a

java.util.ConcurrentModificationException?

caused by things like:

List<String> list = new ArrayList?<String>(Arrays.asList("a", "b", "c")); for (String string : list) { list.remove(string); }

link

flag

answered Jan 20 at 16:14 Fabian Steeg

community wiki

show 1 more comment vote up 13 vote down

Not properly synchronizing on objects returned by Collections.synchronizedXXX(), especially during iteration or multiple operations:

Map<String, String> map = Collections.synchronizedMap(new HashMap?<String, String>());

...

if(!map.containsKey("foo")) map.put("foo", "bar");

That's wrong. It should be:

synchronized(map) { if(!map.containsKey("foo")) map.put("foo", "bar"); }

Or with a ConcurrentMap? implementation:

map.putIfAbsent("foo", "bar");

link

flag

edited Jan 20 at 16:43 Alex Miller

community wiki

2 revisions, 2 users

show 1 more comment vote up 13 vote down

Double-Checked Locking. By and large.

The paradigm, which I started learning the problems of when I was working at BEA, is that people will check a singleton in the following way:

public Class MySingleton? { private static MySingleton? s_instance; public static MySingleton? getInstance() { if(s_instance == null) { synchronized(MySingleton?.class) { s_instance = new MySingleton?(); } } return s_instance; } }

This never works, because another thread might have gotten into the synchronized block and s_instance is no longer null. So the natural change is then to make it:

  public static MySingleton getInstance() {
    if(s_instance == null) {
      synchronized(MySingleton.class) {
        if(s_instance == null) s_instance = new MySingleton();
      }
    }
    return s_instance;
  }

That doesn't work either, because the Java Memory Model doesn't support it. You need to declare s_instance as volatile to make it work, and even then it only works on Java 5.

People that aren't familiar with the intricacies of the Java Memory Model mess this up all the time. link

flag

answered Jan 20 at 17:06 Kirk Wylie

community wiki

show 6 more comments vote up 11 vote down

Forgetting to wait() (or Condition.await()) in a loop, checking that the waiting condition is actually true. Without this, you run into bugs from spurious wait() wakeups. Canonical usage should be:

 synchronized (obj) {
     while (<condition does not hold>) {
         obj.wait();
     }
     // do stuff based on condition being true
 }

link

flag

answered Jan 20 at 16:25 Alex Miller

community wiki

vote up 11 vote down

Another common bug is poor exception handling. When a background thread throws an exception, if you don't handle it properly, you might not see the stack trace at all. Or perhaps your background task stops running and never starts again because you failed to handle the exception. link

flag

answered Jan 20 at 17:06 Eric Burke

community wiki

show 1 more comment vote up 7 vote down

The most common bug we see where I work is programmers perform long operations, like server calls, on the EDT, locking up the GUI for a few seconds and making the app unresponsive. link

flag

answered Jan 20 at 17:01 Eric Burke

community wiki

show 1 more comment vote up 6 vote down

My biggest problem has always been deadlocks, especially caused by listeners that are fired with a lock held. In these cases, it's really easy to get inverted locking between two threads. In my case, between a simulation running in one thread and a visualization of the simulation running in the UI thread.

EDIT: Moved second part to separate answer. link

flag

edited Jan 20 at 16:11 Dave Ray

community wiki

2 revisions

show 2 more comments vote up 6 vote down

Mutable classes in shared data structures

Thread1: Person p = new Person("John"); sharedMap.put("Key", p); assert(p.getName().equals("John"); sometimes passes, sometimes fails

Thread2: Person p = sharedMap.get("Key"); p.setName("Alfonso");

When this happens, the code is far more complex that this simplified example. Replicating, finding and fixing the bug is hard. Perhaps it could be avoided if we could mark certain classes as immutable and certain data structures as only holding immutable objects. link

flag

answered Jan 20 at 18:05 Steve McLeod?

community wiki

vote up 6 vote down

It can be easy to think synchronized collections grant you more protection than they actually do, and forget to hold the lock between calls. If have seen this mistake a few times:

 List<String> l = Collections.synchronizedList(new ArrayList<String>());
 String[] s = l.toArray(new String[l.size()]);

flag

edited Jan 21 at 18:08 Nick

community wiki

2 revisions

show 1 more comment vote up 6 vote down

public class SomeClass?{ private Integer thing = 1;

    public synchronized void setThing(Integer thing)
        this.thing = thing;
    }

    /**
     * This may return 1 forever and ever no matter what is set
     * because the read is not synched
     */
    public Integer getThing(){
        return thing;  
    }}

link

flag

answered Jan 25 at 14:07 John Russell

community wiki

show 2 more comments vote up 5 vote down

Multiple objects that are lock protected but are commonly accessed in succession. We've run into a couple of cases where the locks are obtained by different code in different orders, resulting in deadlock. link

flag

answered Jan 20 at 16:15 bcash

community wiki

vote up 5 vote down

Thinking you are writing single-threaded code, but using mutable statics (including singletons). Obviously they will be shared between threads. This happens surprisingly often. link

flag

answered Jan 20 at 16:45 Tom Hawtin - tackline

community wiki

show 1 more comment vote up 5 vote down

Another common 'concurrency' issue is to use synchronized code when it is not necessary at all. For example I still see programmers using StringBuffer? or even java.util.Vector (as method local variables). link

flag

answered Jan 20 at 19:27 Kutzi

community wiki

show 1 more comment vote up 4 vote down

The dumbest mistake I frequently make is forgetting to synchronize before calling notify() or wait() on an object. link

flag

answered Jan 20 at 16:12 Dave Ray

community wiki

show 2 more comments vote up 4 vote down

I encountered a concurrency problem with Servlets, when there are mutable fields which will be setted by each request. But there is only one servlet-instance for all request, so this worked perfectly in a single user environment but when more than one user requested the servlet unpredictable results occured.

public class MyServlet? implements Servlet{ private Object something;

    public void service(ServletRequest request, ServletResponse response)
        throws ServletException, IOException{
        this.something = request.getAttribute("something");
        doSomething();
    }

    private void doSomething(){
        this.something ...
    }}

link

flag

answered Jan 20 at 17:27 Ludwig Wensauer

community wiki

vote up 4 vote down

Using a local "new Object()" as mutex.

synchronized (new Object()) { System.out.println("sdfs"); }

This is useless. link

flag

answered Jan 20 at 17:38 Ludwig Wensauer

community wiki

1 This is probably useless, but the act of synchronizing at all does some interesting things... Certainly creating a new Object every time is a complete waste. TREE Jan 22 at 15:59 vote up 4 vote down

Arbitrary method calls should not be made from within synchronized blocks.

Dave Ray touched on this in his first answer, and in fact I also encountered a deadlock also having to do with invoking methods on listeners from within a synchronized method. I think the more general lesson is that method calls should not be made "into the wild" from within a synchronized block - you have no idea if the call will be long-running, result in deadlock, or whatever.

In this case, and usually in general, the solution was to reduce the scope of the synchronized block to just protect a critical private section of code.

Also, since we were now accessing the Collection of listeners outside of a synchronized block, we changed it to be a copy-on-write Collection. Or we could have simply made a defensive copy of the Collection. The point being, there are usually alternatives to safely access a Collection of unknown objects. link

flag

answered Jan 20 at 19:12 Scott Bale

community wiki

vote up 4 vote down

I believe in the future the main problem with Java will be the (lack of) visibility guarantees for constructors. For example, if you create the following class

class MyClass? { public int a = 1; }

and then just read the value MyClass?.a from another thread, MyClass?.a could be either 0 or 1, depending on the JavaVM?'s implementation and mood. Today the chances for 'a' being 1 are very high. But on future NUMA machines this may be different. Many people are not aware of this and believe that they don't need to care about multi-threading during the initialization phase. link

flag

answered Jan 20 at 19:24 Tim Jansen

community wiki

show 3 more comments vote up 4 vote down

Synchronizing on a string literal or constant defined by a string literal is (potentially) a problem as the string literal is interned and will be shared by anyone else in the JVM using the same string literal. I know this problem has come up in application servers and other "container" scenarios.

Example:

private static final String SOMETHING = "foo";

synchronized(SOMETHING) { }

In this case, anyone using the string "foo" to lock on is sharing the same lock. link

flag

answered Jan 20 at 20:55 Alex Miller

community wiki

show 2 more comments vote up 4 vote down

Not exactly a bug but, the worst sin is providing a library you intend other people to use, but not stating which classes/methods are thread-safe and which ones must only be called from a single thread etc.

More people should make use of the concurrency annotations (e.g. @ThreadSafe?, @GuardedBy? etc) described in Goetz's book. link

flag

answered Jan 21 at 11:22 Neil Bartlett

community wiki

vote up 4 vote down

The most recent Concurrency-related bug I ran into was an object that in its constructor created an ExecutorService?, but when the object was no longer referenced, it had never shutdown the ExecutorService?. Thus, over a period of weeks, thousands of threads leaked, eventually causing the system to crash. (Technically, it didn't crash, but it did stop functioning properly, while continuing to run.)

Technically, I suppose this isn't a concurrency problem, but it's a problem relating to use of the java.util.concurrency libraries. link

flag

answered Jan 22 at 5:25 Eddie

community wiki

vote up 3 vote down

Unbalanced synchronization, particularly against Maps seems to be a fairly common problem. Many people believe that synchronizing on puts to a Map (not a ConcurrentMap?, but say a HashMap?) and not synchronizing on gets is sufficient. This however can lead to an infinite loop during re-hash.

The same problem (partial synchronization) can occur anywhere you have shared state with reads and writes however. link

flag

edited Jan 20 at 16:22 Alex Miller

community wiki

2 revisions

vote up 3 vote down

Not realising the java.awt.EventQueue?.invokeAndWait acts as if it holds a lock (exclusive access to the Event Dispatch Thread, EDT). The great thing about deadlocks is that even if that happens rarely you can grab a stack trace with jstack or the like. I've seen this in a number of widely used programs (a fix to a problem I have only seen occur once in Netbeans should be included in the next release). link

flag

edited Jan 20 at 16:58 Tom Hawtin - tackline

community wiki

2 revisions

vote up 2 vote down

Use of a global object such as a static variable for locking.

This leads to very bad performance because of contention. link

flag

answered Jan 20 at 16:03 kohlerm

community wiki

show 2 more comments vote up 2 vote down

Honesly? Prior to the advent of java.util.concurrent, the most common problem I routinely ran into was what I call "thread-thrashing": Applications that use threads for concurrency, but spawn too many of them and end up thrashing. link

flag

answered Jan 20 at 16:11 Brian Clapper

community wiki

show 1 more comment vote up 2 vote down

Race conditions during an object's finalize/release/shutdown/destructor method and normal invocations.

From Java, I do a lot of integration with resources that need to be closed, such as COM objects or Flash players. Developers always forget to do this properly and end up having a thread call an object that has been shutdown. link

flag

answered Jan 20 at 16:31 hamletdarcy

community wiki

http://en.wikipedia.org/wiki/Concurrency_pattern

Active Object[1]
Balking pattern
Double checked locking pattern
Guarded suspension
Leaders/followers pattern
Monitor Object
Read write lock pattern
Scheduler pattern
Thread pool pattern
Thread-Specific Storage
Reactor pattern

The active object design pattern decouples method execution from method invocation that reside in their own thread of control.[1] The goal is to introduce concurrency, by using asynchronous method invocation and a scheduler for handling requests.[2]

The pattern consists of six elements:[3]

A proxy, which provides an interface towards clients with publicly accessible methods.
An interface which defines the method request on an active object.
A list of pending requests from clients
A scheduler, which decides which request to execute next
the implementation of the active object method.
A callback or variable for the client to receive the result.

The balking pattern is a software design pattern that only executes an action on an object when the object is in a particular state. For example, if an object reads ZIP files and a calling method invokes a get method on the object when the ZIP file is not open, the object would "balk" at the request. In the Java programming language, for example, an IllegalStateException? might be thrown under these circumstances.

There are some[who?] in this field that think this is more of an anti-pattern, than a design pattern.

http://en.wikipedia.org/wiki/Double_checked_locking_pattern

Double-checked locking From Wikipedia, the free encyclopedia (Redirected from Double checked locking pattern) Jump to: navigation, search

In software engineering, double-checked locking is a software design pattern also known as "double-checked locking optimization[1]". The pattern is designed to reduce the overhead of acquiring a lock by first testing the locking criterion (the "lock hint") in an unsafe manner; only if that succeeds does the actual lock proceed.

The pattern, when implemented in some language/hardware combinations, can be unsafe. It can therefore sometimes be considered to be an anti-pattern.

It is typically used to reduce locking overhead when implementing "lazy initialization" in a multi-threaded environment, especially as part of the Singleton pattern. Lazy initialization avoids initializing a value until the first time it is accessed. Contents [hide]

1 Usage in Java
2 Usage in Microsoft Visual C++
3 See also
4 References
5 External links

[edit] Usage in Java

Consider, for example, this code segment in the Java programming language as given by [3] (as well as all other Java code segments):

Single threaded version class Foo { private static Helper helper = null;

    private Helper(){
 
 
    }
 
    public static Helper getHelper() {
        if (helper == null)
            helper = new Helper();
        return helper;
    }
 
    // other functions and members...}

The problem is that this does not work when using multiple threads. A lock must be obtained in case two threads call getHelper() simultaneously. Otherwise, either they may both try to create the object at the same time, or one may wind up getting a reference to an incompletely initialized object. This is done by synchronizing, as is shown in the following example.

Correct but possibly expensive multithreaded version class Foo { private Helper helper = null; public synchronized Helper getHelper() { if (helper == null) helper = new Helper(); return helper; }

    // other functions and members...}

However, the first call to getHelper() will create the object and only the few threads trying to access it during that time need to be synchronized; after that all calls just get a reference to the member variable. Since synchronizing a method can decrease performance by a factor of 100 or higher[2], the overhead of acquiring and releasing a lock every time this method is called seems unnecessary: once the initialization has been completed, acquiring and releasing the locks would appear unnecessary. Many programmers have attempted to optimize this situation in the following manner:

   1. Check that the variable is initialized (without obtaining the lock). If it is initialized, return it immediately.
   2. Obtain the lock.
   3. Double-check whether the variable has already been initialized: if another thread acquired the lock first, it may have already done the initialization. If so, return the initialized variable.
   4. Otherwise, initialize and return the variable.

Broken multithreaded version "Double-Checked Locking" idiom class Foo { private Helper helper = null; public Helper getHelper() { if (helper == null) { synchronized(this) { if (helper == null) { helper = new Helper(); } } } return helper; }

    // other functions and members...}

Intuitively, this algorithm seems like an efficient solution to the problem. However, this technique has many subtle problems and should usually be avoided. For example, consider the following sequence of events:

   1. Thread A notices that the value is not initialized, so it obtains the lock and begins to initialize the value.
   2. Due to the semantics of some programming languages, the code generated by the compiler is allowed to update the shared variable to point to a partially constructed object before A has finished performing the initialization.
   3. Thread B notices that the shared variable has been initialized (or so it appears), and returns its value. Because thread B believes the value is already initialized, it does not acquire the lock. If the variable is used before A finishes initializing it, the program will likely crash.

One of the dangers of using double-checked locking in J2SE 1.4 (and earlier versions) is that it will often appear to work: it is not easy to distinguish between a correct implementation of the technique and one that has subtle problems. Depending on the compiler, the interleaving of threads by the scheduler and the nature of other concurrent system activity, failures resulting from an incorrect implementation of double-checked locking may only occur intermittently. Reproducing the failures can be difficult.

As of J2SE 5.0, this problem has been fixed. The volatile keyword now ensures that multiple threads handle the singleton instance correctly. This new idiom is described in [4]:

Works with acquire/release semantics for volatile Broken under Java 1.4 and earlier semantics for volatile class Foo { private volatile Helper helper = null; public Helper getHelper() { if (helper == null) { synchronized(this) { if (helper == null) helper = new Helper(); } } return helper; }

    // other functions and members...}

Many versions of the double-checked locking idiom that do not use explicit methods such as volatile or synchronization to communicate the construction of the object have been proposed, and all of them are wrong.[3][4] [edit] Usage in Microsoft Visual C++

Double-checked locking can be implemented in Visual C++ 2005 if the pointer to the resource is declared with the C++ keyword volatile. Visual C++ 2005 guarantees that volatile variables will behave as fences, as in J2SE 5.0, preventing both compiler and CPU arrangement of reads and writes[citation needed]. There is no such guarantee in previous versions of Visual C++. However, marking the pointer to the resource as volatile may harm performance elsewhere, if the pointer declaration is visible elsewhere in code, by forcing the compiler to treat it as a fence elsewhere, even when it is not necessary. "

http://en.wikipedia.org/wiki/Guarded_suspension " In concurrent programming, guarded suspension[1] is a software design pattern for managing operations that require both a lock to be acquired and a precondition to be satisfied before the operation can be executed. The guarded suspension pattern is typically applied to method calls in object-oriented programs, and involves suspending the method call, and the calling thread, until the precondition (acting as a guard) is satisfied. Contents [hide]

1 Usage
2 Implementation
3 See also
4 Notes
5 References

[edit] Usage

Because it is blocking, the guarded suspension pattern is generally only used when the developer knows that a method call will be suspended for a finite and reasonable period of time. If a method call is suspended for too long, then the overall program will slow down or stop, waiting for the precondition to be satisfied. If the developer knows that the method call suspension will be indefinite or for an unacceptably long period, then the balking pattern may be preferred. [edit] Implementation

In Java, the Object class provides the wait() and notify() methods to assist with guarded suspension. In the implementation below, originally found in Kuchana (2004), if there is no precondition satisfied for the method call to be successful, then the method will wait until it finally enters a valid state.

public class Example { synchronized void guardedMethod() { while (!preCondition()) { try { Continue to wait wait(); & } catch (InterruptedException? e) { & } } Actual task implementation } synchronized void alterObjectStateMethod() { Change the object state &.. Inform waiting threads notify(); } }

An example of an actual implementation would be a queue object with a get method that has a guard to detect when there are no items in the queue. Once the "put" method notifies the other methods (for example, a get() method), then the get() method can exit its guarded state and proceed with a call. Once the queue is empty, then the get() method will enter a guarded state once again. [edit] See also

Read write lock pattern
Balking pattern is an alternative pattern for dealing with a precondition
Guarded commands includes a similar language construct "

"A read/write lock pattern or simply RWL is a software design pattern that allows concurrent read access to an object but requires exclusive access for write operations."

---

"In computer programming, the scheduler pattern is a software design pattern. It is a concurrency pattern used to explicitly control when threads may execute single-threaded code.

The scheduler pattern uses an object that explicitly sequences waiting threads. It provides a mechanism to implement a scheduling policy, but is independent of any specific scheduling policy the policy is encapsulated in its own class and is reusable.

The read/write lock pattern is usually implemented using the scheduler pattern to ensure fairness in scheduling.

"In computer programming, the thread pool pattern is where a number of threads are created to perform a number of tasks, which are usually organized in a queue. Typically, there are many more tasks than threads. As soon as a thread completes its task, it will request the next task from the queue until all tasks have been completed. The thread can then terminate, or sleep until there are new tasks available."

"Thread-local storage (TLS) is a computer programming method that uses static or global memory local to a thread.

This is sometimes needed because all threads in a process share the same address space. In other words, data in a static or global variable is normally always located at the same memory location, when referred to by threads from the same process. Variables on the stack however are local to threads, because each thread has its own stack, residing in a different memory location.

Sometimes it is desirable that two threads referring to the same static or global variable are actually referring to different memory locations, thereby making the variable thread local, a canonical example being the C error code variable errno.

"The reactor design pattern is a concurrent programming pattern for handling service requests delivered concurrently to a service handler by one or more inputs. The service handler then demultiplexes the incoming requests and dispatches them synchronously to the associated request handlers.

Resources: Any resource that can provide input or output to the system.
Synchronous Event Demultiplexer: Uses an event loop to block on all resources. When it is possible to start a synchronous operation on a resource without blocking, the demultiplexer sends the resource to the dispatcher.
Dispatcher: Handles registering and unregistering of request handlers. Dispatches resources from the demultiplexer to the associated request handler.
Request Handler: An application defined request handler and its associated resource.

[edit] "

stm primitives: "atomic" blocks "retry" command

oop stm in middle/end of: http://s3.amazonaws.com/dconf2007/DSTM.ppt

" [reply] Re: (OT) Programming languages for multicore computers by BrowserUk? (Saint) on May 06, 2009 at 10:07 UTC

      The following was started 2 days ago, but I've been unable to finish it. It was suggested to me I post it as-is anyway, and if there is sufficient interest, I'll try to complete it later.

      My first response to almost every question you've asked, is: It depends.

      It mostly depends upon the requirements of the algorithm and/or data that needs processing. There are essentially 3 basic types of parallelism (with many variations):
         1. IO parallelism:

            The simplest of them all. Mostly IO-bound.

            Typified by webservers and the like. Each execution context is essentially independent of all others.

            Each execution context spends most of its time waiting (a device or communications partner) for something to do. And when it does get something to do, it does it relatively quickly and then goes back to waiting.
         2. Task parallelism:

            Potentially the most complex. A mixture of IO and cpu.

            Typified by the Google Map-Reduce architecture. (Though that is (currently) applied at the macro (box) level).

            Each execution context runs a different algorithm on independent data, but they need to coordinate and synchronise between each other. Here the algorithms (tasks) are overlapped by passing the stream of data from one stage to the next in smallish chunks. Think of a pipeline where one stream of data needs to be processed by several different algorithms, but serially--read; decode; transform; encode; write.
         3. Data parallelism:

            Potentially the biggest gains. CPU-bound.

            Typified by image (X-ray; MRI;etc.) processing.

            One large homogeneous dataset needs a one or two, often simple algorithms applied to the entire dataset. Small subset of the dataset are assigned to each execution context perform the same algorithm(s) on. 
         1. This type lends itself to event driven/state machine/select methods.

            The problems arise when the process also need to perform some cpu-intensive processing in addition to the IO-driven processing. This necessitates the programmer injecting artificial breaks into the cpu-intensive code in order to ensure timely servicing of the IO events.

            Whilst not onerous for a single single program with exclusive use of the hardware, it becomes necessary to re-tune the software for each cpu; OS; end even application mix; that ths code will run on. making for costly porting and on-going maintenance.
         2. As mentioned above, this type is currently being dealt with at the macro-level.

            Using clusters of commodity hardware and disk-based 'pipelines' for shared storage. Whilst reasonably effective for the last, current, and possibly next generations of 1, 2, and 4-core commodity boxes with 2 to 4 GB of ram, the cost of all the disk-IO between stages will start to make itself felt as we move to larger numbers of cores and ram per box.

            Once you have cores ranging from 8 to 32; multiplied by simultaneous threading per core of 2 to 8; and anything from 64 to 512GB per box, it makes less and less sense to use processes and disk-based storage to handle the pipelines.

            When the machine can store an entire partition of the dataset entirely in memory, it is far quicker to load the dataset into memory once, and have each stage of the pipeline operate on it in place. You could do this by running each stage over the dataset serially, but as with the current scheme, handing over smallish chunks from stage to stage allows you to overlap the stages and vastly reduce the end-to-end processing time. So, running all the stages as threads, bucket-chaining over a single, large shared dataset is a natural evolution of the processes and disks model that brings huge efficiency saving for negligible extra infrastructure costs.

            Just a single shared integer between each stage that is incremented by the previous stage and read-only to the following stage. Indicating how far the previous stage has made it through the dataset, and where the following stage continues it next cycle. Utilises only simple, two-party condition signalling with no possibility of dead-locks, priority inversions or any of those other scare-monger nasties.

            Huge performance gains secured from avoiding inter-stage pipeline IO costs.
         3. This type is already happening in games and Computer Generated Imagery.

            It simply doesn't make sense to partition these kinds of datasets (images etc.) across processes, let alone boxes. The shared-memory model and threading, is the only way to go. But again, the "big problems" of the shared memory model--deadlocking; priority inversion etc.--do not arise, because the each thread operates exclusively on its own partition of the overall dataset.

            By linearly or recursively partitioning the dataset, no locking is required and each thread is free to run full-speed on its part of the problem to completion. The only synchronisation involved is the master thread waiting for the workers to finish before it does whatever needs doing with the results.

            More and more, this kind of processing will be offloaded to specialist CPUs (dsps; gpus)(hereafter SPUs), for processing. However, with current setups this involves the transfer of data from cpu memory to SPUs memory and back again. And with potentially multiple processes needing the SPU's services, and with memory access already the bottleneck on performance, it will make more sense in the near future for Spus to do their thing by directly accessing the data in the main memory. Effectively making the SPUs cores extensions of the main cpu. We're already seeing this with SIMD and similar instruction sets.

      The future is threaded. We are just waiting for the software to catch up with the hardware. And I fear we are going to have to wait for the next generation of programmers before that problem will be properly fixed.

      Just as many of my generation have problems with using SMS--and with the language forms it generates--whilst it is simpler than tying shoelaces for the new generations. So it is with threading. Many in my generation only see the problems involved--not the potential.

      Just as it tool the ubiquity of the web to drive the transition from th request-response model to the upcoming Web 2 era. So, it will require the ubiquity of threads and cores to drive the transition from forks&pipes to threads&shared state.It may take until the retirement of the Luddite generation for it to happen. But it will happen.

      As you've rightly indicated, one of the primary drivers of the transition will be the evolution of computer languages that give easy--and well abstracted--access to the potential. Whilst many of those you cite are a adept at one or more of the types of concurrency, none of them are truly adept at all of them. And problems arise because real-world problems tend to require two or more of those types in the same application.

      A secondary problem with most existing language abstractions of concurrency is that they tend to take one of two tacts:
          o A low-level approach--typified by C/C++ and libraries like Intel Threading Building Blocks--whereby the programmer is given access to, and required to exercise, full control over all aspects of the threading.

            This not just enables, but forces the programmer to deal with all the issues of sharing state, not just for that data that needs to be shared, but all state! And that places the onus upon the programmer to ensure the segregation of per stage (per thread) state. A time consuming and rigorous task much better suited to automation by the compiler.
          o A high-level, encapsulated approach--typified by most of the concurrent FP languages like Haskell and Erlang--which removes most of the control from the programmers hands.

            The problem here is that the FP (and even const correct procedural) languages prevent the programmer (at least without resorting to extraordinary procedures (with often "dangerous" sounding names)), from sharing state for those data that need to be shared, with the result that the data has to be needlessly and expensively copied, often many times, as it transitions through multi-stage pipelined processes; or as multiple workers concurrently do the same processing on their own small chunks of the total dataset, and then they are 're-assembled' back to a whole.

            This compiler enforced (and needless) replication of state becomes a huge drain upon system resources via memory thrash, IO and communications protocols overheads. This can turn O(n) algorithms in to O(n^2) algorithms (or worse), once those overheads are factored into the equations. That detracts from, and can even completely negate, the benefits of concurrency. Add in the extra costs of concurrent development, maintenance and testing, and it is easy to see why people see little to be gained from the concurrency.

      The solution is relatively simple, in concept at least. There needs to be a clear delineation between that state that needs to be shared--for efficient concurrency. And that state that mustn't be shared--for algorithmic safety. And the programmer must have explicit control over the former, whilst the compiler ensures and enforces the latter. In this way, thread procedures can be written without regard to the safety of local variables and state. Anything declared 'local', or 'non-shared', or better, any not explicitly marked as shared, is protected through compiler-enforced mechanisms from leaking between threads. At the same time, shared state can be passed around between threads as the needs of the algorithm dictate, without incurring huge penalties of copying involved in write-once variables.

      Another way of viewing the two concurrency abstractions is:
          o the former seeks to give the programmer a zillion tools for controlling access to shared state and synchronising control flows between threads.

            The problem here, beyond the well-known difficulties of getting this stuff right, is that the exposure and predominance of these mechanisms actively encourages programmers--especially those new to concurrency--to design their algorithms around utilising locks and synchronisation.
          o whilst the latter seeks to hide all such control within the abstraction beyond the programmers reach.

            With this approach, you end up with either weird and unintuitive programming constructs and paradigms; or huge, unwieldy and slow compile-time analysis engines; or belt&braces, copy-everything and lock-everything always--"whether it needs it or not"--infra-structure and library code.

      There is a third answer to the problems of locking and synchronisation. Avoid them! Sounds too simple to be a point worth making, but it is a fact that most algorithms and applications that can benefit from concurrency can, with a little care, be architected such that they use little or no locking or synchronisation, despite that they may manipulate large volumes of shared state, in place. The trick is to program the application so that only one thread attempts to access any given piece of data an any given time. I'm not talking about using mutexs to prevent concurrent access. More, mechanisms that don't allow the programmer to try. But without throwing the (procedural) baby out with the bath water.

      This can be done today--in any language. It just requires the programmer to be aware of the techniques and apply them. But it would be a whole lot easier if programmers didn't have to become aware of the implementation details or re-invent them every time. To that end, there are several new abstractions that can help:
         1. Memory delegates- handles to (subsections of) real shared memory, that can be passed between threads, but which can only be used by the thread holding the delegate. Old copies of the delegate become disabled. Runtime enforcement is fatal. Compile-time detection easy.
         2. Thread-controlled shared variables:

            This variable is shared--by threads A & B (& F ...) ONLY!
         3. Access type controls on shared variables:

            This variable is shared between threads A & B, but only thread A can write to it and only thread B can read from it. And B cannot read from it until A has written it. And thread A cannot write to it again until thread B had read it.
         4. Bi-directional latching shared variables.

            Variable shared between threads A & B, readable and writable by both; but cannot read until A has written; and once B has read, neither can read (and A cannot write again) until B has written; now only A can read and neither can read again (and B cannot write again), until A has written.

            And so you encapsulate the request-response communications protocols in a single variable.
         5. Self limiting queues:

            Like thread controlled variables, which threads have access (and what type of access) is controlled, and both compile-time and run-time enforced. But in addition, the queues limit their own sizes such that any attempt to write when the queue is 'full' causes the writer to block. Any attempt to read when the queue is empty causes the reader to block.

            You get self-regulating synchronisation with tunable buffering.
         6. Declarative threading of functions for 'fire and forget' usage.
         7. Delayed (lazy) results from threads (promises).

            Instead of having explicitly create a thread passing the thread procedure address; retrieve the handle; and then later, explicitly call a blocking join to get the results. Instead, the above two combine to give something like:

            sub threadProc1 :threaded { ... } sub threadProc2 :threaded { ... } my $result1 = threadProc1( @args ); my $result2 = threadProc2( @otherArgs ); unless( defined $result1 && defined $result2 ) { ## continue doing other stuff while we wait } ## If the threads have completed by the time we reach this statement ## then the statements completes immediately. Otherwise it blocks unti +l ## both results are available. Effectively joining both threads. my $composite = $result1 + $result2;
            [download]

            /li>

      With a few simple control and data sharing constructs such as these, all the scary, difficult parts of threading disappear under the covers, and the programmer can get back to concentrating uponn their application whilst knowing that it will benefit from whatever core and threads are available.

      You can imagine the pipeline example (read; decode; transform; encode; write), from above being written something along the lines of;

      my $pipeline :promise = TQmap { open $out, '>', $outFile; write( $out, $_ ) while $Qin->dequeue; close $out; } 1, 10, TQmap { while( $Qin->dequeue ) { my $encoded = encode( $_ ); $Qout->enqueue( $encoded ); } } 2, 20, TQmap { while( $Qin->dequeue ) { my $transformed = transform( $_ ); $Qout->enqueue( $transformed ); } }, 4, 40, TQmap { while( $Qin->dequeue ) { my $decoded = decode( $_ ); $Qout->enqueue( $decoded ); } }, 2, 20, TQmap { ## read open my $in, '<'. $inFile; $Qout->enqueue( $_ ) while <$in>; close $in; }, 1, 10, undef; monitorKeyboard() until defined $pipeline; if( $pipeline != SUCCESS ) { log( error, $pipeline ); } else { log( success, $pipeline ); } exit;
      [download]

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."" --- http://perlmonks.org/?node_id=762207

http://www.computerworld.com.au/article/307418/-z_programming_languages_erlang :

" Some things we added to the language were with hindsight not so brilliant. I'd happily remove macros, include files, and the way we handle records. I'd also add mechanism to allow the language itself to evolve.

We have mechanisms that allow the application software to evolve, but not the language and libraries itself. We need mechanisms for revision control as part of the language itself. But I don't know how to do this. I've been thinking about this for a long time.

Instead of having external revision control systems like Git or Subversion I'd like to see revision control and re-factoring built into the language itself with fine-grain mechanism for introspection and version control. "

ampl language

scala vs java: http://www.computerworld.com.au/article/315254/-z_programming_languages_scala "The challenge was to identify constructs from one side with constructs from the other. For instance, a function value in a functional programming language corresponds to an object in an object-oriented language. Basically you could say that it is an object with an "apply" method. Consequently, we can model function values as objects.

Another example is in the algebraic data types of functional languages that can be modelled as class hierarchies on the object-oriented side. Also, the static fields and methods as found in Java. We eliminated these and modelled them with members of singleton objects instead. There are many other cases like these, where we tried to eliminate a language construct by matching and unifying it with something else. "

"What is your favourite feature of the language, if you had to pick? I don't think I can name a single favourite feature. I'd rather pick the way Scala's features play together. For instance, how higher-order functions blend with objects and abstract types, or how actors were made possible because functions in Scala can be subclassed. The most interesting design patterns in Scala come precisely from the interaction between object-oriented and functional programming ideas. "

wikipedia's eiffel page explains features analogous to haskell's type parameters (generics) and contexts (constraints that certain type parameters be in certain type classes) in a simple way:

"Names can, of course, be reused in different classes. For example the "+" operator is defined in several classes: INTEGER, REAL, STRING, etc. [edit] Genericity

Classes can be generic, to express that they are parameterized by types. Generic parameters appear in square brackets:

class LIST [G] ...

G is known as a "formal generic parameter". (Eiffel reserves "argument" for routines, and uses "parameter" only for generic classes.) With such a declaration G represents within the class an arbitrary type; so a function can return a value of type G, and a routine can take an argument of that type:

item: G do ... end put (x: G) do ... end

The LIST [INTEGER] and LIST [WORD] are "generic derivations" of this class. Permitted combinations (with n: INTEGER, w: WORD, il: LIST [INTEGER], wl: LIST [WORD]) are

n := il.item wl.put (w)

INTEGER and WORD are the "actual generic parameters" in these generic derivations.

It is also possible to have 'constrained' formal parameters, for which the actual parameter must inherit from a given class, the "constraint". For example in

   class HASH_TABLE [G, KEY -> HASHABLE]

a derivation HASH_TABLE [INTEGER, STRING] is valid only if STRING inherits from HASHABLE (as it indeed does in typical Eiffel libraries). Within the class, having KEY constrained by HASHABLE means that for x: KEY it is possible to apply to x all the features of HASHABLE, as in x.hash_code. "

"infecting nondeterminism" like brace expansion in shells but with infection; in shell, a{d,c,b}e -> ade ace abe;

;; for separator

make sure you can do this concisely: http://en.wikipedia.org/wiki/Delegation_pattern

toread: http://www.springerlink.com/content/hwua58hp0164kbep/ "A Marriage of Class- and Object-Based Inheritance Without Unwanted Children " http://www.springerlink.com/content/vj48mc681h0ctvel/ "Intersecting Classes and Prototypes"

"late binding of self": in OOP, a primary difference between true delegation and mere consultation is that, if the delegee code refers to "self", that refers to the delegator. see figure 1 in "Intersecting Classes and Prototypes"

In delegation, when the delegator object is called, the delegator calls the delegee code. Late binding of self means that if the delegee code refers to "self", then that refers to the delegator object, not to the delegatee object.

mb UML is useful for graph data patterns:

"graph-oriented programming" http://www.springerlink.com/content/v16l843781g54283/ GOP: A Graph-Oriented Programming Model for Parallel and Distributed Systems http://docs.jboss.org/jbpm/v3/userguide/graphorientedprogramming.html

call data interfaces "views"? is this similar to "roles" http://en.wikipedia.org/wiki/Role-Oriented_Programming ?

roles: read Kasper B. Graversen's thesis: http://www.itu.dk/people/kbilsted/graversen06thesis.pdf

--- why Python doesn't have multiline lambda:

(a) don't want to complicate the identation-sensitive parser with

(b) don't want to add new syntactic primitives for this

-- http://stackoverflow.com/questions/1233448/no-multiline-lambda-in-python-why-not

other optional semicolon proposals:

Javascript has rather implicit semicolons but has complex rules for semicolon insertion: http://bclary.com/2004/11/07/#a-7.9

https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/XuMrWI0Q8uk

http://news.ycombinator.com/item?id=3092859

http://www.pieratnine.com/semicolons-are-not-optional

http://code.google.com/p/dart/issues/detail?id=34

http://www.cs.nmsu.edu/~jeffery/godiva/semi.html

--- http://regebro.wordpress.com/2009/07/12/python-vs-ruby/

http://wit.io/posts/the-ugliness-of-python

http://news.ycombinator.com/item?id=283639

--- " Outside of Common Lisp, Python ctypes may be the best FFI out there."

" ajross 1480 days ago

link

Blocks and lambdas

Yeah, but this is an area where both languages are just a mess. If python's lambda is castrated, ruby's notion of an anonymous function is tumorous. We've got blocks, procs, lambdas and methods? What's the difference?

Ruby mixes up the syntactic requirements of "passing" some code to a loop to iterate with with the data requirements of binding a function to a scope. It's just a huge mess.

LISP and Scheme figured all this out decades ago, and yet only Javascript (and, almost, perl) among common scripting languages has managed to absorb the lesson. "

" LogicHoleFlaw? 1480 days ago

link

Lua gets it right too.

ajross 1480 days ago

link

As does nasal: http://plausible.org/nasal

That's my own language, and quite obscure (FlightGear? uses it, and I've done quite a bit with it personally). But if you want to include Lua too, we might as well be consistent. I was trying to stay within languages with which most readers here would be familiar.

nostrademons 1480 days ago

link

As does Eve (http://eve-language.blogspot.com/), as long as we're throwing out pet languages. ;-) Though I just redid virtually everything about Eve (changed it from a multiple-dispatch functional language to a single-dispatch prototype-based functional language) and haven't yet pushed the changes to GitHub?.

For that matter, Ocaml/Haskell/Erlang all get it right, and those are not pet languages. And Cecil and Dylan almost get it right, but have some complications introduced by multimethods. So it's really just the popular languages tha screw up.

" " Interestingly enough the language everyone loves to hate, PHP, as of 5.3 has figured this out as well."

---

" But ruby's methods really do suck, and so do the complex construction of Blocks. While I recognize that blocks allow you to do weird things (like use return on the calling scope), the fact that they necessitate a special form (yield) and cannot be bound to a variable, and are not passed as standard function parameters just makes them inconsistent and gross. "

-- " > Even Tcl "uplevel" is better than Python's castrated lambda.

Tcl, in its own way, is more powerful than both Ruby and Python in that only the syntax is a given. There is nothing special about 'if', 'while', and other control structures, which are not syntax, but commands. Indeed, you can write your own control structure commands in Tcl itself, with 'do ... while' being a classic example. "

cben 1472 days ago

link

I want to focus on one point that I find an interesting study in language design:

> * I haven't found an expression that is better written as a list comprehension than as a map/reduce/select expression.

map/select are better iff you have a concise block syntax.

For mapping with a single ready function, all styles are reasonable:

    items.map foo
    items.map { |x| foo(x) }
    map(foo, items)
    [foo(i) for i in items]

Now suppose you want to modify the mapping function to foo(bar(i)). You could cascade it:

    items.map(foo).map(bar)
    map(foo, map(bar, items))

but this quickly gets out of hand and makes modifications painful. What you really want is a place to write arbitrary expressions instead of just a function name.

With comprehensions OR a good block syntax, it's a smooth transition:

    [foo(bar(i)) for i in items)]
    items.map { |i| foo(bar(i)) }

But without one, the change is ugly and annoying:

    map(lambda i: foo(bar(i)), items)

So why wouldn't Python just embrace a nice block syntax? Precisely because this naturally leads to expressing most control structures as functions taking blocks!

Now that is not wrong in itself, but it is a question of taste. Pythonic taste favors a different approach: instead of passing the block directly to a function, have a few built-in control structures that can act as combinators.

E.g. don't write an "each" method that accepts a block - write an iterator, and use the built-in "for" loop as a bridge between the iterator and the block. Why would this be any better?

1. Because we can: it turns out that the vast majority of custom control structures can be sorted into a few patterns (iteration, pre/post guards, function wrapping), each served by one natural combinator ("for", "with", decorators). There is a price: it takes time until new patterns are recognized and the missing combinator is added (e.g. "with" is a very recent addition).

2. Decoupling: an iterator is a self-contained passive object that can be used outside of a for loop. An "each" method takes control and is less versatile.

3. Readability: seeing the combinator gives you an immediate idea about the style of flow control. This is just a matter of taste (Lisp and Ruby solve this with naming conventions).

4. Syntax taste: stupid as it sounds, passing blocks to functions is just alien to Python's syntax. Many smart people tried to marry them - and it wasn't a pretty sight...

But in the end you can't judge decisions of taste by pro/con arguments. You have to look at the sweet spots (and the sour spots) that arise in practice. A particularly nice sweet spot that arose in Python is the combination of reduction functions with generator expressions:

    sum(i**2 for i in range(10))
    any(p % i == 0 for i in range(2, p - 1))
    dict((v, k) for (k, v) in dict_to_invert)

This is very cool because the map-reduce pattern covers lots of useful computations. Of course it's just a one case and YMMV. I'm sure Ruby has its own sweet spots...

Python evaluation of None, 0, as False:

" In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). All other values are interpreted as true. (See the __nonzero__() special method for a way to change this.) "

---

" Rights Amplification

Opening a can of TunaThere? is one feature that most capability systems provide as a primitive but which is not motivated solely from pure object programming -- rights amplification. With rights amplification, the authority accessible from bringing two references together can exceed the sum of authorities provided by each individually. The classic example is the can and the can-opener -- only by bringing the two together do we obtain the food in the can.

Two common forms of rights amplification are sibling communication [Hardy, Gosling96, Shalit96] and sealer/unsealer pairs [Morris73, Miller87, Tribble95 Appendix D, Rees96]. E primitively provides sealer/unsealer pairs. The money example below builds sibling communication from sealer/unsealer pairs.

Sealer/unsealer pairs are similar in concept to public/private key pairs. The sealer is like an encryption key, and the unsealer like a decryption key. The provided primitive, makeBrandPair, makes and returns such a pair. When the sealer is asked to seal an object it returns an envelope which can only be unsealed by the corresponding unsealer.

    ? pragma.syntax("0.8")

    ? def makeBrandPair := <import:org.erights.e.elib.sealing.makeBrand>

    ? def [sealer, unsealer] := makeBrandPair("MarkM")
    # value: [<MarkM sealer>, <MarkM unsealer>]

    ? def envelope := sealer.seal("Tuna")
    # value: <sealed by MarkM>

    ? unsealer.unseal(envelope)
    # value: "Tuna"

If the envelope is the can and the unsealer is the can-opener (specific to this brand of cans), then Tuna is the food. (The name-string "MarkM?" provided as an argument to makeBrandPair is purely for documentation and debugging purposes.) " -- http://www.erights.org/elib/capability/ode/ode-capabilities.html#simple-money

E sample def makeMint(name) :any { def [sealer, unsealer] := makeBrandPair(name) def mint { to __printOn(out) :void { out.print(`<$name's mint>`) }

            to makePurse(var balance :(int >= 0)) :any {
                def decr(amount :(0..balance)) :void {
                    balance -= amount
                }
                def purse {
                    to __printOn(out) :void {
                        out.print(`<has $balance $name bucks>`)
                    }
                    to getBalance() :int { return balance }
                    to sprout()     :any { return mint.makePurse(0) }
                    to getDecr()    :any { return sealer.seal(decr) }

                    to deposit(amount :int, src) :void {
                        unsealer.unseal(src.getDecr())(amount)
                        balance += amount
                    }
                }
                return purse
            }
        }
        return mint
    }

(The "name" variable and the "__printOn" methods illustrate no security properties. They exist purely for debugging purposes.)

The guard declaration ":(int >= 0)" above only allows non-negative integers to be bound to the "balance" variable. The guard declaration ":(0..balance)" only allows an integer in the range 0 through balance inclusive to be bound to "amount". Guards form a soft typing system [Cartwright91] -- conceptually, they check their conditions at runtime, and so can express conditions as above that are beyond static verification. (When an implementation can statically verify or falsify a condition, it is encouraged to do so, to avoid the cost of a runtime check, and to provide the programmer with early information about what guards cannot/might/must fail.) " -- -- http://www.erights.org/elib/capability/ode/ode-capabilities.html

neat Haskell examples:

xsSorted = sortBy (compare `on` fst) xs -- observe the `on` combinator (largest, smallest) = (maximum &&& minimum) xs allCombinations = sequence . (replicate =<< length)

" up vote 2 down vote

I am a Scala programmer working with the language since over a year. I have been learning myself some Haskell lately, and here are a few observations so far:

    Scala is strict by default, and does not have polymorphic first class values. This can sometimes lead to disasters such as this. Compare the Scala code in that post with this Haskell:

    const :: a -> b -> a
    const x _ = x

    Being curried by default, and not having a function-method dichotomy makes Haskell more amenable to pure function composition.

    A few examples:

    xsSorted = sortBy (compare `on` fst) xs -- observe the `on` combinator
    (largest, smallest) = (maximum &&& minimum) xs
    allCombinations = sequence . (replicate =<< length)

    And here is one I just posted at StackOverflow. Here is the Scala code:

    def range(xs: List[Int], ys: List[Int]): List[List[Int]] = {
      (xs, ys).zipped.map((x, y) => List.range(x, y + 1)).sequence
    }

    Compare this with following Haskell:

    range :: [Int] -> [Int] -> [[Int]]
    range = (sequence .) . zipWith enumFromTo

    All of these are contrived examples, but I hope they help get the point across.

    In Scala, ADTs are represented with classes and objects. This leads to problems such as Some(3)'s type is inferred as Some[Int]. So you end up adding extraneous type annotations such as Some(3) : Option[Int] or using some utility function such as some(3) (defined in Scalaz). In Haskell, Just 3 has a type Maybe Int. (Just merely is a value constructor, not a type.)

    The "problem" I discussed in above bullet point has a small advantage too. Consider the following Haskell ADT:

    data D = A { a :: Int, b :: Int }
           | C { c :: Int }

    aob = A 4 5
    cob = C 9

    And here is the corresponding Scala:

    sealed trait D
    case class A(a: Int, b: Int) extends D
    case class C(c: Int) extends D

    val aob = A(4, 5)
    val cob = C(9)

    In Haskell version, c aob will compile but fail at runtime, as function c has type D -> Int, and not C -> Int (because C is not a type). On the contrary, an invocation such as aob.c in Scala will fail at compile time as type A does not have a field named c. So this provides an extra bit of typesafety.

    In Haskell, Either a b has a kind *, Either a has kind * -> *, Either has kind * -> * -> *. So in Haskell, type constructors are curried. That's not the case with Scala, and so we need to resort to notoriously ugly type lambda. For a one level partial application on Either[A, B] in Scala, you have to write ({type L[X] = Either[A, X]})#X. This also has a small advantage that you can choose which type parameters you want to partially apply. For example, (type L[X] = Either[X, A])#X. They can be chosen arbitrarily, unlike in Haskell where type constructors being curried can be partially applied only in a sequential manner. For arbitrary partial type application in Haskell, you have to introduce a type synonym.

    Similar to bullet number 5. Scala's partial application syntax for functions has a similar advantage over Haskell's curried by default scheme. No flips, instead just a few underscores sprinkled here and there.

    Unlike in Haskell, typeclasses in Scala are types themselves, and the typeclass instances are first class values. This allows for some interesting things such as one described here. In Scala, you can make instances of one typeclass implement another typeclass. For example, Scalaz makes Show, Equal, Resource typeclasses implement typeclass Contravariant. This allows you do stuff like:

    implicit val s: Show[Dog] = implicitly[Show[Int]] contramap ((_: Dog).age)

    Typeclasses in Scala are more powerful than those in Haskell in a few more respects. You can have multiple instances of one typeclass for the same type. Instances can be passed at call-site. You can precisely control the precedence level of typeclass instances by putting them in different traits and mixing them in appropriate order.

A subjective closure: When you are programming in a pure functional style, it's bounds more easier to do so with Haskell. However when your problem is more amenable to OO (i.e. can be solved better with subtyping), and requires a fair bit of mutability, you are better off with Scala. " -- http://programmers.stackexchange.com/a/112185