proj-oot-ootNotes6

https://www.semipublic.comp-arch.net/wiki/Bad,_Good,_and_Middling_Ideas_in_Computer_Architecture

https://www.semipublic.comp-arch.net/wiki/Design_Principles_and_Rules_of_Thumb

--

my notes on katherine coons's notes on tim sweeney's talk

Three Kinds of Code

game simulation (shared state concurrency): software transactional memory numeric computation: implicit thread parallelism (purely functional) shading: implicit data parallelism (SIMD)

  --

Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m) { Vertex[] Result = new Vertex[Indices.length]; for(int i=0; i<Indices.length; i++) Result[i] = Transform(m,Vertices[Indices[i]]); return Result; };

problems: each input may be NULL, Indices May contain indices outside of the range of the Vertex array, so Indices.length could dereference a NULL pointer, Vertices[Indices[i]] might be out of bounds., transform may hang

Dynamic failure in mainstream languages

Solved problems: Random memory overwrites Memory leaks (?)

Solvable: Accessing arrays out of bounds Dereferencing null pointers Integer overflow Accessing uninitialized variables

50% of the bugs in Unreal can be traced to these problems!

Transform{n:nat}(Vertices:[n]Vertex, Indices:[]nat<n, m:Matrix):[]Vertex= for each(i in Indices) Transform(m,Vertices[i])

the only problem left: transform may hang

how might this work?

  --

Dependent types

int nat nat<n

Dependent functions

Sum(n:nat,xs:[n]int)=.. a=Sum(3,[7,8,9])

Universal quantification

Sum{n:nat}(xs:[n]int)=.. a=Sum([7,8,9])

  --

(maybe/option types): Separating the “pointer to t” concept from the “optional value of t” concept

xp:^int xo:?int xpo:?^int

Advantages: You can’t dereference a NULL pointer The compiler can force you to do the appropriate error checks

?int x = lookup(t, k) FOUND(x) -> { use x … } NOTHING(x) -> { display error message of your choice

  --

Comprehensions (a la Haskell), for safely traversing and generating collections

Successors(xs:[]int):[]int= foreach(x in xs) x+1

Now we can’t have an index out of bounds error!

But this ties the Collections object and Iterator interface directly to the language - sacrifice abstraction

A guarded casting mechanism for cases where need a safe “escape”:

GetElement?(as:[]string, i:int):string= if(n:nat<as.length=i) as[n] else “Index Out of Bounds”

  --

Exceptions impose sequencing constraints on concurrent execution.

Dependent types and concurrency must evolve simultaneously

  --

Analysis of Unreal code

Usage of integer variables in Unreal: 90% of integer variables in Unreal exist to index into arrays 80% could be dependently-typed explicitly, guaranteeing safe array access without casting. 10% would require casts upon array access. The other 10% are used for: Computing summary statistics Encoding bit flags Various forms of low-level hackery

How true are these observations for other types of code?

“For” loops in Unreal: 40% are functional comprehensions 50% are functional folds Functional comprehensions (foreach) Functional folds: Operator to encapsulate simple patterns of recursion for processing lists

  --

Accessing uninitialized variables

class MyClass? { const int a=c+1; const int b=7; const int c=b+1; } MyClass? myvalue = new C; What is myvalue.a?

This is a frequent bug. Data structures are often rearranged, changing the initialization order. Lessons from Haskell: Lazy evaluation enables correct out-of-order evaluation Accessing circularly entailed values causes thunk reentry (divergence), rather than just returning the wrong value Lesson from Id90: Lenient evaluation (“An evaluation strategy under which all reducible expressions are evaluated in parallel except inside the arms of conditionals and inside lambda abstractions.”) is sufficient to guarantee this

  --

Evaluation Strategy Lenient evaluation is the right default. Support lazy evaluation through explicit suspend/evaluate constructs. Eager evaluation is an optimization the compiler may perform when it is safe to do so.

  --

Integer Overflow

The Natural Numbers : data Nat = Zero

Succ Nat

Factoid: C# exposes more than 10 integer-like data types, none of which are those defined by (Pythagoras, 500BC).

In the future, can we get integers right?

Neat Trick: In a machine word (size 2n), encode an integer ±2n-1 or a pointer to a variable-precision integer Thus “small” integers carry no storage cost Additional access cost is ~5 CPU instructions But: Array indexes don’t need this encoding Since ~80% of integers can dependently-typed to access into an array, the amortized cost is ~1 CPU instruction per integer operation. How true is this for other application domains?

  --

Reasonable type-system extensions could statically eliminate all: Out-of-bounds array access (comprehensions, dependent types) Null pointer dereference (optional variables) Integer overflow (dependent types with pointer access) Accessing of uninitialized variables (lenient evaluation)

  --

Memory model Garbage collection should be the only option

Exception Model The Java/C# “exceptions everywhere” model should be wholly abandoned All dereference and array accesses must be statically verifiable, rather than causing sequenced exceptions No language construct except “throw” should generate an exception

-- http://www.cs.utexas.edu/~coonske/presentations/fgp.ppt

--

Lazy operations construct a query execution plan

that is, The object being operated on can request to see the whole chain of lazy pending operations at once

--

Scala inner classes enclosing class, outer object, #

--

Scala nullary methods can be postfix (did they mean prefix? )

--

Also, go thru Scala tour and make why Oot additions

--

Scala

Overrides keyword

--

scala tour comprehensions example is verbose because comprehensions are verbose, pair constructor is verbose, no repr. Why Oot additions.

--

scala unapply does pattern matching with inverses

--

https://docs.google.com/presentation/d/15-7qFy6URdE7Owi2LitkQI_OHBu1AFWPUwHxgBc-O4E/preview?sle=true#slide=id.g1793dde48_1412

starting at slide 115

(ns jfdi (:use clojure.pprint))

;; so I can use pprint at REPL

jfdi> (+ 3 4) 7 jfdi> (+ 3 4 5 6 7) ;; takes variable arity 25 jfdi> (+) ;; including 0 0 jfdi> (+ 137 (* 32 3 25 13)) 31337

jfdi> (def six 6) ;; define a value

  1. 'jfdi/six jfdi> (defn square [x] (* x x)) ;; fn def.
  2. 'jfdi/square jfdi> (def square (fn [x] (* x x))) ;; alternative / identical jfdi> (square six) 36

jfdi> (defn factorial [n] (if (= n 0) 1 (* n (factorial (- n 1))))) jfdi> (factorial 5) 120 jfdi> (map factorial (range 8)) ;; (range 8) => (0 1 2 3 4 5 6 7) (1 1 2 6 24 120 720 5040)

jfdi> (defn factorial-2 [n] (reduce * (range 1 (inc n)))) ;; chance to show off reduce.

  1. 'jfdi/factorial-2 jfdi> (factorial-2 7) 5040

jfdi> (class 5) ;; Clojure data lives on JVM java.lang.Long jfdi> (class "five") ;; same as Java’s String java.lang.String jfdi> (class true) java.lang.Boolean jfdi> (class \C) java.lang.Character

;; symbols and lists are what code’s made of jfdi> (class '+) clojure.lang.Symbol jfdi> (list '+ 1 2 3) (+ 1 2 3) jfdi> (eval (list '+ 1 2 3)) 6

jfdi> (class :a) ;; string-like, interned, used for map fields. clojure.lang.Keyword jfdi> (def my-map {:white 1 :red 5 :blue 10})) jfdi> (get my-map :red) jfdi> (my-map :red) ;; map *is* a function jfdi> (:red my-map) 5 ;; all 3 return same result. jfdi> (get my-map :orange) nil ;; Java null

jfdi> [1 2 (+ 3 4)] ;; vector [1 2 7] jfdi> #{5 "five" "v" 5.0 "5"} ;; set

  1. {5.0 5 "five" "5" "v"} jfdi> (int-array 1024) ;; prim. arrays
  2. <int[] [I@5a3184d8>

jfdi> (def a (atom 0)) ;; mutable "cell" jfdi> @a ;; or: (deref a) 0 jfdi> (reset! a 38) jfdi> @a 38 jfdi> (swap! a inc) ;; swap! is atomic (preferred) 39

jfdi> (reset! a 0) jfdi> (dotimes [i 1e4] ;; 10k *conceptual* threads. (future (let [x @a] (reset! a (inc x))))) ;; what implicitly happens w/ naive imperative code / bad concurrency. jfdi> @a 9544 ;; concurrency fail jfdi> (reset! a 0) jfdi> (dotimes [i 1e4] (future (swap! a inc))) ;; atomic/correct jfdi> @a 10000 ;; correct

Not shown:

(def db [{:name "Ayla" :gender :female :level 74 :element :fire :badassery } {:name "Crono" :gender :male :level 53 :element :lightning :zombie true} {:name "Frog" :gender :male :level 60 :element :water} {:name "Lucca" :gender :female :level 61 :element :fire :glasses true} {:name "Marle" :gender :female :level 57 :element :water} {:name "Magus" :gender :male :level 70 :element :shadow :hot-sister-issues true} {:name "Robo" :gender :robot :level 54 :element :shadow :theme-song "rickroll"}])

jfdi> (map :name db) ;; SELECT name FROM db ("Ayla" "Crono" "Frog" "Lucca" "Marle" "Magus" "Robo") jfdi> (frequencies (map :gender db)) {:female 3, :male 3, :robot 1} jfdi> (group-by :gender db) {:robot [{:name “Robo”, :level 54, :element :shadow, …}] :female [<Ayla> <Marle> <Lucca>], :male [<Crono> <Frog> <Magus>]}

jfdi> (map :name (sort-by :level db)) ;; ...ORDER BY level ("Crono" "Robo" "Marle" "Frog" "Lucca" "Magus" "Ayla") jfdi> (defn female? [record] (= (:gender record) :female))

  1. 'jfdi/female? jfdi> (filter female? db) ;; ...WHERE gender = female ({:gender :female, :name "Ayla", :level 74, :element :fire, ...} {:gender :female, :name "Lucca", :level 61, :element :fire, …} {:gender :female, :name "Marle", :level 57, :element :water}) jfdi> (map (juxt :name :level) (filter female? db)) (["Ayla" 74] ["Lucca" 61] ["Marle" 57])

jfdi> (map :level (filter female? db)) (74 61 57) jfdi> (reduce + (map :level (filter female? db))) ; aggregation 192 jfdi> (defn average [coll] (/ (reduce + 0.0 coll) (count coll)))

  1. 'jfdi/average jfdi> (average (map :level (filter female? db))) ; summary stats 64.0

Motivation: runtime tracing of functions is a powerful debug technique. It’d require a framework to do this in Java, but in Clojure, a macro suffices!

jfdi> (defmacro tracing [code] `(do (println "The code was: " '~code) (let [ret# ~code] (println "Returning " ret#) ret#)))

2
do sequences multiple forms, returns last value (others ;; used for side effects!)
3
'~code: unquote to insert the code, quote to block ;; run-time evaluation.

jfdi> (tracing (+ 5 6)) The code was: (+ 5 6) Returning 11 11 jfdi> (+ 5 (tracing (* 3 6))) The code was: (* 3 6) Returning 18 23

jfdi> (defn buggy-* [x y] (inc (* x y)))

  1. 'jfdi/buggy-* jfdi> (defn buggy-fct [n] (reduce buggy-* (range 1 (inc n)))) jfdi> (buggy-fct 4) ;; because of bug in 41 ;; buggy-* jfdi> (with-redefs [buggy-* *] (buggy-fct 4)) 24 ;; correct. So buggy-* is the problem.

jfdi> (def buggy-*-root buggy-*) ;; avoid clobber jfdi> (with-redefs [buggy-* (fn [x y] (eval `(tracing (buggy-*-root ~x ~y))))] (buggy-fct 3)) ;; this is slightly insane & in practice there are better ways. ;; but this is something you’d need a bulky, bloated framework to do in e.g. Java! The code was: (jfdi/buggy-*-root 1 2) Returning 3 The code was: (jfdi/buggy-*-root 3 3) ;; as you see, the bug’s here. 3*3 != 10. Returning 10 10

slide 106:

(map #(+ % 2) [2 3 5 7 11])

Want map in Java?

interface Function1 { Object invoke (Object x); {

class AddTwo? implements Function1 { public Object invoke (Object x) { int res = (int)(Integer) x + 2; return res; } }

class Map { public static Object[] invoke (Function1 f, Object[] src) { Object[] dest = new Object(src.length]; for (int i = 0; i < src.length; i++) { dest[i] = f.invoke(src[i]); } return dest; } }

class JavaExample? { public static void main (String args[]) { Object[] data = new Integer[5]; data[0] = 2; .. {2, 3, 5, 7, 11} AddTwo? add2 = new AddTwo?[]; Object[] res = Map.invoke(add2, data); } }

---

Ygg2 1 day ago

link

What is my biggest gripe theoretical gripe about Scala is that Trait order is important! Because traits can override common function. A class that is e.g. Ball with traits Shiny and Red is NOT the same as e.g. Ball with traits Red and Shiny. Why? Why complicate testing to a point where you need not only test traits for correct behavior, and not just composing of Traits, but even the order in which they are composed? Why?! Why complicate it so much?

Note: It's been a while since I've seen traits, this might have changed for better.

To me Scala is essentially if Java, Haskell and C++ went to a wizard to have a child. The result is a magical mess. I seriously think that Scala has at least twice as many concepts as C++ and same if not greater number of really weird edge cases. I don't want to keep Scala in my mind, no more than I want to keep Necromicon.

reply

pkolaczk 1 day ago

link

As long as your traits are independent, the order does not matter. Anyway, how do you imagine this could be done better? For order to not matter, the linearization could not be possible and we'd end up with something much more complex and ugly like multiple inheritance in C++ with all the diamond inheritance problems, etc.

reply

epsylon 1 day ago

link

Actually, there's a very elegant solution to the diamond problem. Eiffel does this with a renaming mechanism. Martin Odersky certainly knew about Eiffel's solution when he designed Scala, but my guess is that, he chose the simpler solution of traits, just like most languages, because general multiple inheritance just isn't too useful.

Still, it's a shame that most people interested in programming languages have never heard about Eiffel, or ever read the superb book "Object-Oriented Software Construction" by Eiffel's creator, Bertrand Meyer. Eiffel has many flaws (the most proeminent being the complete lack of support for functional idioms), but it has very good ideas as well. (Perhaps the most popular influence Eiffel had on other languages is Design by Contract and the uniform access principle, which you see in most "modern" languages.)

reply

Ygg2 1 day ago

link

Don't have an answer right this minute, but I think eschewing/forbidding variables in trait would prevent the most obvious such collisions.

If you can't override variables, you can't silently change the structure to depend on trait order. You'd have to explicitly state order.

PS. Of course if you have several same signature methods that call super, which is generated depending on order, same problem persist. So, forbidding variables and calling super elements should deal with most "magical" interactions.

reply

pkolaczk 1 day ago

link

If you can't call super nor use variables then this takes quite a lot of power from traits. IMHO being able to stack many traits one on another is a nice pattern.

reply

Ygg2 1 day ago

link

Yeah, you trade power for easier testing and unforseen side-effects for determinism. IMO it's worth it.

reply

---

" That is, React builds the virtual DOM tree lazily for diffing based on what path in the tree actually changed.

As it turns out, the default shouldComponentUpdate implementation is extremely conservative, because JavaScript? devs tend to mutate objects and arrays! So in order to determine if some properties of a component have changed, they have to manually walk JavaScript? objects and arrays to figure this out.

Instead of using JavaScript? objects, Om uses ClojureScript? data structures which we know will not be changed. Because of this, we can provide a component that implements shouldComponentUpdate by doing the fastest check possible - a reference equality check. This means we can always determine if the paths changed, starting from the root, in logarithmic time.

"

(the point is that allowing the use of reference equality on immutabile objects can be much faster)

---

Csharp try pattern (i forgot exactly what this is, but i think it's a convention where the word "Try" is in the method name, and the return value is a bool indicating success or failure, and the real return value is placed into a variable that was passed in by reference (using an 'out' declaration))

Csharp has a distinction between input parameters which are declared 'ref' and which are declared 'out'. Both are passed by reference, but 'out' has the additional constraint that whatever value was in it when it is passed in must be ignored, e.g. it is ONLY there to put outputs into.


" [1] I tried Haskell and Elixir as candidates, but nested data holders with multiply circular references appear to be problematic to deal with in functional languages. Immutable data presents interesting challenges when it comes to cyclic graphs! The solutions suggested by the respective communities involved considerable boiler-plate. More importantly, the resulting code lost direct correspondence with the problem's structural elements. Eventually, I abandoned that approach.↩ "

asdasf 1 hour ago

link

His dismissal of haskell is rather misinformed. Haskell has no such problems with cyclic data structures. And this:

>Except that abstracting things into functions that take the `old' graph as input, and answer the `new' graph as output are not very convenient or easy.

Is just crazy. Yes, it is very convenient and easy: http://hackage.haskell.org/package/mtl-2.0.1.0/docs/Control-Monad-State-Lazy.html

reply

--

vjeux 13 hours ago

link

The biggest cause of bugs before we used React is the fact that the "state" of the app is leaking all over the place. Symptoms were things like looking for dom nodes that didn't exist, events that triggered on code that was no longer active, bugs due to the app being in a state extremely complex to reproduce with crazy conditions all over the place, race conditions...

React does many things to mitigate those problems and those classes of bugs nearly dropped to 0.

In the end, we found that React actually helped us write more reliable app compared to framework-less Javascript code.

reply

codygman 7 hours ago

link

Mutable state can be a huge time sink. It's why I'm putting more and more time into functional programming, since I think that Haskell (and Clojure) is closer to what programming in the future will be like.

I think many programmers don't respect how much complexity is reduced when state isn't leaking all over the place.

Of course there will always be times when you need state, but many times its worth jumping through a few hoops to make things work without state.

EDIT: If programmers learn C so they can think lower level, they should learn Haskell so they can think in terms of side effects, composability, and the advantages of pure functions. Just a reminder, pure functions can be created in any language though some (Clojure, Haskell) enforce purity.

reply

--

random idea from Doug, not sure if i like it, just writing it down to think about later:

" 3) A

interface representing the members shared by A and B. So if A has a *void Run()* method, and B has a *void Run()* method, but they are from independent inheritance heirarchies, Acommon ancestor of A and B, which implements an interface that contains a *void Run()* method. This would be a form of duck typing used to utilize the maximum amount of static information available to the compiler. "
B is a compiler generated type that implements a compiler generated
B is a class derived from a greatest

--

i dont want to write code that looks like English, but i do think there are some semantics in natural languages that might inspire programming language features.

For example, i say "I am Bayle" and also "I am human" and also "I am red". In programming languages these might not look so similar:

"am i bayle?" may be "i == bayle" "am i human?" may be "isSubclassOf(class(i), human)" "am i red?" may be "i.color == red"

There are times in programming when you don't want this distinction. For example, in some code i create an exception type ConnectionException?. Then i write some code where i catch an exception e and say "if class(e) == ConnectionException? then ...". But then i decide to make a further distinction and so i create a subclass of ConnectionException?, NetworkUnavailableException?. Now in many languages my 'if' statement won't trigger on NetworkUnavailableException?, because although NetworkUnavailableException? is a subclass of ConnectionException?, it is not equal to ConnectionException?, any more than the concept 'bayle' is identical to the concept of 'human'. So the code is less extensible than it could be. I should have said 'if isSubclassOf(class(e), ConnectionException?) then '.. just in case i ever wanted to subclass ConnectionException?, but that's harder to type, and easy to forget to do.

---

http://www.evanmiller.org/why-im-betting-on-julia.html

" Not only can you write code with the performance of C in Julia, you can take a peek behind the curtain of any function into its LLVM Intermediate Representation as well as its generated assembly code — all within the REPL. Check it out.

emiller ~/Code/julia (master) ./julia _ _ _ _(_)_

A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+261 (2013-11-30 12:55 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 97b5983 (0 days old master)
__/x86_64-apple-darwin12.5.0

julia> f(x) = x * x f (generic function with 1 method)

julia> f(2.0) 4.0

julia> code_llvm(f, (Float64,))

define double @julia_f662(double) { top: %1 = fmul double %0, %0, !dbg !3553 ret double %1, !dbg !3553 }

julia> code_native(f, (Float64,)) .section __TEXT,__text,regular,pure_instructions Filename: none Source line: 1 push RBP mov RBP, RSP Source line: 1 vmulsd XMM0, XMM0, XMM0 pop RBP ret

Bam — you can go from writing a one-line function to inspecting its LLVM-optimized X86 assembler code in about 20 seconds. "

---

idea from Hoon: an easy way to encode marshal any value into URL-safe form?

--

if you type 'oot' within a oot interpreter, it should work :)

--

" The most basic technique exploits the fact that on modern CPUs, either reading or writing a number in memory is atomic. By this I mean that combining a read with a write can lead to corruption, but doing either a read or a write alone does not. In the past, reading a multibyte number could lead to corruption, because in the nanoseconds between reading the first byte of a number another core could write to the second byte. This can no longer happen."

-- http://blog.erratasec.com/2013/02/multi-core-scaling-its-not-multi.html#.UwJE19um3PU

--

" The x86 processor has an assembly language instruction called “lock”. It’s not really it’s own instruction, but instead modifies the following instruction to be atomic. When the normal “add” instruction reads data from memory, adds to it, then writes the data back, another core could modify that memory location in the meantime, causing corruption. The “lock add” instruction prevents this from happening, guaranteeing the entire addition to be atomic.

Think of this as a “hardware mutex” rather than the traditional “software mutex”, only that it causes code to stop and wait for 30 instruction cycles rather than 30,000. By the way, the cost is because this is done within the L3 or “last level” cache. On current Intel CPUs, that’s about 30 clock cycles.

The “lock” prefix works only on a few arithmetic instructions, and only one value at a time. To work with more than one value, you need to use the “cmpxchg16b” instruction. What you do is first read 16 bytes of data. Then you make all your changes you want on that 16 bytes. Then using “cmpxchg16b”, you attempt to write all the changes back again. If that memory was changed in the meantime, this instruction fails and sets a flag. That way, you know synchronization failed, data would have been corrupted, and that you must back up and try again.

It’s 16-bytes because that’s the size of two pointers. It allows you to modify two pointers atomically, or a pointer plus an integer. This feature is called “CAS2” or “compare-and-swap two numbers”, and is the basis for a lot the “lock-free” stuff described below.

Intel’s new “Haswell” processor shipping in mid-2013 extends this model to “cmpxchg64b” or “cmpxchg128b”, where the regions of memory do not have to be next to each other. This feature is called “transactional memory”. This will make good, fast, scalable synchronization must easier in the future.

You don’t want to mess around with assembly language, especially since you want your code to run on both x86 and ARM. Therefore, compilers let you access these instructions with built-in functions. On gcc, example functions are __sync_fetch_and_add() and __sync_bool_compare_and_swap(). They work just as well on x86 as ARM. Microsoft has similar intrinsics for their compilers.

The above atomics act on one thing at a time. In practice, you need something more complex. For example, you might have 100 CPU cores trying to work off the same hash table, inserting things, removing things, and even resizing the entire table, all at the same time, all without requiring a core to stop and wait for another to finish.

The general term this goes under is “lock-free”. You don’t have to write hash-tables, linked-list, and other data structures yourself. Instead, you simply use libraries created by other people.

You can also link to large subsystems that incorporate lock-free inside. A good example are “heaps” or “malloc()”. The standard Microsoft heap has a global mutex that really saps performance on multi-core code. You can replace it with a lock-free heap simply by linking to another library. And such things tend to be cross platform.

You should be very afraid of doing this yourself, unless you are willing to study the problem in its entirety. It’s like crypto: people tend to make the same naïve mistakes. One example is the “ABA” problem. When doing a “compare-and-swap” like cmpxchg instruction mentioned above, sometimes the value changes, then changes back again. Thus, you think nothing else has changed, by it has. Another example is the “weak/strong memory model” problem: your lock-free code might work on x86 but fail on ARM. If you get the desire to write your own lock-free algorithms, google these issues, otherwise, they will bite you. " -- http://blog.erratasec.com/2013/02/multi-core-scaling-its-not-multi.html#.UwJE19um3PU

what's he talking about with transactional memory in Haskell? is cmpxchg128b transactional? i don't think so, i think it's:

http://software.intel.com/en-us/forums/topic/456300

TSX (Transactional Synchronization Extensions) and HLE (Hardware Lock Elision)

https://www.google.com/search?client=ubuntu&channel=fs&q=TSX%2FHLE&ie=utf-8&oe=utf-8

http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions

http://www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions/

"

Haswell's TSX

The new Transactional Synchronization eXtensions (TSX) extend the x86 ISA with two new interfaces: HLE and RTM.

Restricted Transactional Memory (RTM) uses Xbegin and Xend, allowing developers to mark the start and end of a critical section. The CPU will thread this piece of code as an atomic transaction. Xbegin also specifies a fall back path in case the transaction fails. Either everything goes well and the code runs without any lock, or the shared variable(s) that the thread is working on is overwritten. In that case, the code is aborted and the transaction has failed. The CPU will now execute the fall back path, which is most likely a piece of code that does coarse grained locking. RTM enabled software will only run on Haswell and is thus not backwards compatible, so it might take a while before this form of Hardware Transactional Memory is adopted.

The most interesting interface in the short term is Hardware Lock Elision or HLE. It first appeared in a paper by Ravi Rajwar and James Goodman in 2001. Ravi is now a CPU architect at Intel and presented TSX together with his colleague Martin Dixon TSX at IDF2012.

The idea is to remove the locks and let the CPU worry about consistency. Instead of assuming that a thread should always protect the shared data from other threads, you optimistically assume that the other threads will not overwrite the variables that the thread is working on (in the critical section). If another thread overwrites one of those shared variables anyway, the whole process will be aborted by the CPU, and the transaction will be re-executed but with a traditional lock.

If the lock removing or elision is successful, all threads can work in parallel. If not, you fall back to traditional locking. So the developer can use coarse grained locking (for example locking the entire shared structure) as a "fall back" solution, while Lock Elision can give the performance that software with a well tuned fine grained locking library would get.

 According to Ravi and Martin, the beauty is that the developer of your locking libraries simply has to add a few HLE instructions without breaking backwards compatibility. The developer uses the new TSX enabled library and gets the benefits of TSX if his application is run on Haswell or a later Intel CPU."

http://www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions/3 suggests that HLE is done by adding two new prefixes to other assembly commands: XACQUIRE and XRELEASE to tell the CPU which commands are acquiring and releasing locks. Presumably HLE only executes these if the transaction fails.

RTM apparently uses Xbegin and Xend to mark the beginning and ending of transactions, with Xbegin also specifying a "fall back path in case the transaction fails..., which is most likely a piece of code that does coarse grained locking"

" Brutalizer - Saturday, September 22, 2012 - link Sun built Transactional Memory years ago with their SPARC ROCK cpu, to be used in the new SuperNova? Solaris servers. But for various reasons ROCK was killed (I heard that it used too much wattage, among other problems).

The good news is that ROCK research is not wasted, most of it is used in newer products. The new coming SPARC T5 to be released this year, has Transactional Memory. It will probably be the first sold cpu that offers TM. "

" Hardware Lock Elision

Hardware Lock Elision (HLE) adds two new instruction prefixes, XACQUIRE and XRELEASE. These two prefixes reuse the opcodes of the existing REPNE / REPE prefixes (F2H / F3H). On processors that do not support TSX, REPNE / REPE prefixes are ignored on instructions for which the XACQUIRE / XRELEASE are valid, thus enabling backward compatibility.[8]

The XACQUIRE prefix hint can only be used with the following instructions with an explicit LOCK prefix: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. The XCHG instruction can be used without the LOCK prefix as well.

The XRELEASE prefix hint can be used both with the instructions listed above, and with the MOV mem, reg and MOV mem, imm instructions.

HLE allows optimistic execution of a critical section by eliding the write to a lock, so that the lock appears to be free to other threads. A failed transaction results in execution restarting from the XACQUIRE-prefixed instruction, but treating the instruction as if the XACQUIRE prefix were not present. Restricted Transactional Memory

Restricted Transactional Memory (RTM) is an alternative implementation to HLE which gives the programmer the flexibility to specify a fallback code path that is executed when a transaction cannot be successfully executed.

RTM adds three new instructions: XBEGIN, XEND and XABORT. The XBEGIN and XEND instructions mark the start and the end of a transactional code region; the XABORT instruction explicitly aborts a transaction. Transaction failure redirects the processor to the fallback code path specified by the XBEGIN instruction, with the abort status returned in the EAX register. EAX Register Bit Position Meaning 0 Set if abort caused by XABORT instruction. 1 If set, the transaction may succeed on a retry. This bit is always clear if bit 0 is set. 2 Set if another logical processor conflicted with a memory address that was part of the transaction that aborted. 3 Set if an internal buffer overflowed. 4 Set if debug breakpoint was hit. 5 Set if an abort occurred during execution of a nested transaction. 23:6 Reserved. 31:24 XABORT argument (only valid if bit 0 set, otherwise reserved). XTEST instruction

TSX provides a new XTEST instruction that returns whether the processor is executing a transactional region.

" -- http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions

--

"

erichocean 12 hours ago

link

What's significant to me is that you can do this stuff today on stock Linux. No need to run weird single-purpose kernels, strange hypervisors, etc.

You can SSH into your box. You can debug with gdb. Valgrind. Everything is normal...except the performance, which is just insane.

Given how easy it is, there isn't really a good excuse anymore to not write data plane applications the "right" way, instead of jamming everything through the kernel like we've been doing. Especially with Intel's latest E5 processors, the performance is just phenomenal.

If you want a fun, accessible project to play around with these concepts, Snabb Switch[0] makes it easy to write these kinds of apps with LuaJIT?, which also has a super easy way to bind to C libraries. It's fast too: 40 million packets a second using a scripting language(!).

I wrote a little bit about a recent project I completed that used these principles here: https://news.ycombinator.com/item?id=7231407

[0] https://github.com/SnabbCo/snabbswitch

reply

"

--- http://blog.interfacevision.com/design/design-visual-progarmming-languages-snapshots/

	https://news.ycombinator.com/item?id=7274674

zachrose 2 days ago

link

It's interesting that there appear to be two main ideas: text coding but with less typing and more syntax highlighting (Squeak), and "draw a network" (LabView?, Pure Data).

reply

seanmcdirmid 2 days ago

link

There are actually more than a few themes:

[1] http://pages.cs.wisc.edu/~fischer/papers/synthesizer.pdf

[2] http://www.templetons.com/brad/alice.html

[3] http://web.engr.oregonstate.edu/~burnett/Forms3/forms3.html

[4] http://www.agentsheets.com/

[5] http://kurlander.net/DJ/Projects/Chimera/resources.html

[6] http://www.sigchi.org/chi95/proceedings/papers/ac1bdy.htm

[7] http://research.microsoft.com/en-us/projects/kodu/

[8] http://www.toontalk.com/

reply

---

" Then, I often find myself writing GUI code. I wouldn't know how to start writing a GUI in Erlang, but it's trivial in Python or C++.

If anything, I would need a language centered around mutability, so almost the opposite of erlang. A language where everything is mutable, where you can databind to any object. Where you can just make an array of objects and bind it to a graphical widget, and get create, edit, update, delete operations for free. You never have to write `listview.insert(...)`. Maybe the command pattern is part of the language and it is trivial to write an undo function. And finally, it would include multithreading, with only a simple syncronization primitive, the transaction. The goal of concurrency here would not be speed, but GUI responsiveness. " -- https://news.ycombinator.com/item?id=7277974

--

mrcharles 1 day ago

link

On the game I'm currently working on, it's built very heavily around Lua. So for the save system, we simply fill a large Lua table, and then write that to disk, as Lua code. The 'save' file then simply becomes a Lua file that can be read directly into Lua.

This is absolutely amazing for debugging purposes. Also you never have to worry about corrupt save files or anything of it's ilk. Development is easier, diagnosing problems is easier, and using a programmatic data structure on the backend means that you can pretty much keep things clean and forward compatible with ease.

(Oh, also being able to debug by altering the save file in any way you want is a godsend).

reply

JackC? 1 day ago

link

You probably know this, but remember that storing user data as code is a place where you (general "you") have to think very carefully about security.

Is there any way that arbitrary code in the file could compromise the user's system? If so, does the user know to treat these data files as executables? Is there any way someone untrusted could ever edit the file without the user's knowledge? Even in combination with other programs the user might be running? Are you sure about all of that?

Maybe Lua in particular is sandboxed so that's not a problem (beats me), but in general this is an area where safe high-level languages can all of a sudden turn dangerous. Personally I would rarely find it worth it.

reply

--

_DEFAULT_PORT = 1234

class SomeProtocol?: ... def __enter__(self): self._client = socket() self._client.connect( (self.host, self.port or _DEFAULT_PORT) ) return self

  1. If you want to subclass SomeProtocol?, you would have to overwrite every method!
  2. Better class SomeProtocol?: _default_port = 1234 ... def __enter__(self): self._client = socket() self._client.connect( (self.host, self.port or self._default_port))

--

http://stevenloria.com/python-best-practice-patterns-by-vladimir-keleshev-notes/

--

http://sahandsaba.com/thirty-python-language-features-and-tricks-you-may-not-know.html

--

an interesting PHP vulnerability:

https://blog.nearlyfreespeech.net/2009/11/05/a-php-include-exploit-explained/

---