proj-oot-ootAssemblyNotes10

perhaps, in addition to ALIASed memory locs, we should have memory locs that go thru getters/setters? That's kind of a generalization of boxing, come to think of it, or mb a special case (you call some program-specific boxed thingee whenever you want to read or write the boxed quantity; a different call for read and for write; the program can do the boxing/unboxing differently depending on which object it is). Also i guess this generalizes aliasing, because if it's boxed you can do aliasing in the boxing. Similarly, you can do the COW stuff this way. I like it.

i guess this really just IS boxing. But it's cool.

Maybe i'm persuaded to make boxing the other addr mode bit.

---

so i removed the following from the assembly spec:

To explicitly indicate addressing mode, use #MODE (where MODE is either an unsigned decimal integer or hex literal) before the operand, for example:

ADD #2 x, #0 2, #0 2

Note that when #MODE is used, you can use any of the previous syntaxes; the result will compiled into a 12-bit operand according to the above addressing-mode syntaxes, but then the actual addressing mode will be set via #MODE.

To use syntax to indicate the 3 low-order bits of the addressing mode but to manually indicate the high-order bit is 1, use '##', eg the following indicates the high-bit-set variants of register, indirect-read-post-increment-write-pre-decrement, indirect-read-post-increment-write-pre-decrement:

ADD ## x, ## POP, ## z--

in other words the previous is equivalent to:

ADD #12 x, #13 POP, #14 z--

or equivalently:

ADD #c x, #d POP, #e z--

ok so i was thinking above that since we can do so much with boxed mode (eg getters/setters, ALIAS, GC, dynamic type tags, COW), maybe i'll make it official. I guess the distinction between 'boxed' address modes and 'custom addr modes' is whether the addr mode itself gets passed to the user-provided custom function, or whether the function only gets told the effective address (if a write) or value (if a read) (hmm, what if there is a write to register 0, ie a DISCARD? does that even get passed? Probably not, right). So boxing is strictly less expressive than custom address modes. Which is fine; since it's already so powerful, and since i can't immediately think of any non-efficiency-related uses for full custom addr modes.

Now, these boxing functions. When we read or write in a boxed address mode, some custom code gets called. Is this custom code determined by:

?

The value being accessed would probably be the most expressive choice. But that would require passing around extra information with each such value. Otoh, the value is already 'boxed' anyways.

The static type of value being accessed is less dynamic but also very expressive. But that would require the implementation to care quite a bit about types, and to effectively support polymorphism. That's the kind of complicated thing that we wanted the Oot runtime to do for us, right?

The address being executed also sounds good, but it would require a big table of which boxing handler goes with which address.

The currently executing module sounds okay, but doesn't help us when data is being passed from one module to the next. It seems pretty useless for some library to be able to define its own boxing if that means that boxed values that it creates must be treated as opaque by others, and boxed values that it receives from others must be treated as opaque by it.

Dynamic sounds like it wouldn't be a ton of use (except for 'bootstrapping' type stuff) and would impede static optimization.

Choosing once per program sounds okay; this would aid static optimization, and it would allow us to fit in much of the other stuff into the per-program handler if we wanted it (which is better than making the Oot Bytecode implementer do it).

i kinda want it to be per subaddress space (non-directly-accessible address space), eg different spaces of the opaque integer addresses could have different boxing functions, but within each subaddress space it's the same.

How would that work? Maybe when you malloc, you pass in the boxing functions. Then either the opaque addresses are actually tuples that point to the boxing functions too, or the boxed objects in the space have fields that point to the boxing functions (actually, the latter idea is back to per-value, not per-address space).

This is sounding a lot like classes. 'Passing in boxing functions at malloc time' sounds a lot like 'constructing a new instance (which includes allocating for it) by specifying which class it is'. If we're doing something like that, we want to be able to determine the class of a given value at compile time, not only at runtime, because we want to be able to compile this stuff to eg Java, and because we want everything to have the efficiency of statically typed code at this low level, because we want the Oot implementation itself to be statically typed.

(later) how about the module that created the address subspace: perhaps this could be forced to be known at compile-time? The reason is that i was thinking about how we want to be able to 'monkeypatch' the primitive instructions on a per-module basis -- if that extended to subroutines called then it's kinda dynamic anyways but... mb not extend it to subroutines called for exactly that reason. does 'module that created the address subspace' really help that much over 'address subspace' anyways? it's not like this would make the compiler able to decide at compile-time how to specialize each callsite another argument in favor of 'address subspace' is that, since our 'pointer arithmetic' in address subspaces is logical rather than physical (+1 gets to the next field in the struct, not to the next byte), the implementation already has to do some dynamic indirection in order to do anything. for this reason, perhaps introduce a notion of 'address subspace type' and require that for each instruction, the 'address subspace type' of each operation for that instrution must be known at compile-time? the annotations specifying this sounds like a lot of additional data to lug around

Attaching the boxing functions per-instance sounds more like prototypes, or like late-bound dynamic typing OOP.

So right now it seems like the two best choices might be:

Per-program is simpler and less expressive, and per-type is more expressive. I guess the question is, how much harder to implement would per-type really be. I was already thinking about giving types to everything, but i hadn't really decided if stack maps etc would actually be mandatory, or if the VM does verification upon load (like the JVM). And how much would it slow down things, and make it harder to implement, for a naive interpreter implementation to be always looking up the type of everything? Wouldn't they have to be walking through the stack maps upon each instruction, or at least be maintaining some sort of parallel stack with type tags? Well, actually, it's no that hard, just keep a type tag with everything. Hey, it's already boxed, right? But now what about compilers. Yes, they have to keep up with the stack maps and find the type of everything, at least if they're compiling to a static target where they can't just make the target dynamically consult the type tags. But if they have a static target, then they're looking at the types anyways, right?

What about my idea that at the Oot Bytecode (i should start saying OVM, Oot VM) level, things should not be polymorphic? Because we don't want the inefficiency of runtime polymorphic dispatch, we don't want the inefficiency of recompiling everything multiple times, once for each type, we want the benefits of monomorphic inline caches. Well, i already said i want generics, so we can't have all that all the time.

And i'm still worried about compiling to non-polymorphic static languages like C. Well, the fact is that Oot Core will be polymorphic, so stuff in C is gonna get clunky anyways.


mb see: http://beust.com/weblog/2011/07/29/erasure-vs-reification/ http://stackoverflow.com/questions/879855/what-are-reified-generics-how-do-they-solve-type-erasure-problems-and-why-cant http://stackoverflow.com/questions/1927789/why-should-i-care-that-java-doesnt-have-reified-generics http://programmers.stackexchange.com/questions/176665/generics-and-type-erasure http://stackoverflow.com/questions/355060/c-sharp-vs-java-generics http://stackoverflow.com/questions/31693/what-are-the-differences-between-generics-in-c-sharp-and-java-and-templates-i https://news.ycombinator.com/item?id=8381870

"

up vote 14 down vote

This is an old question, there are a ton of answers, but I think that the existing answers are off the mark.

"reified" just means real and usually just means the opposite of type erasure.

The big problem related to Java Generics:

    This horrible boxing requirement and disconnect between primitives and reference types. This isn't directly related to reification or type erasure. C#/Scala fix this.
    No self types. JavaFX 8 had to remove "builders" for this reason. Absolutely nothing to do with type erasure. Scala fixes this, not sure about C#.
    No declaration side type variance. C# 4.0/Scala have this. Absolutely nothing to do with type erasure.
    Can't overload void method(List<A> l) and method(List<B> l). This is due to type erasure but is extremely petty.
    No support for runtime type reflection. This is the heart of type erasure. If you like super advanced compilers that verify and prove as much of your program logic at compile time, you should use reflection as little as possible and this type of type erasure shouldn't bother you. If you like more patchy, scripty, dynamic type programming and don't care so much about a compiler proving as much of your logic correct as possible, then you want better reflection and fixing type erasure is important.

"

" Mostly I find it hard with serialization cases. You often would like to be able to sniff out the class types of generic things getting serialized but you are stopped short because of type erasure. It makes it hard to do something like this deserialize(thingy, List<Integer>.class) – Cogman Aug 6 '14 at 23:10 "

http://beust.com/weblog/2011/07/29/erasure-vs-reification/ :

gotchas with type erasure:

type, replaced it with Object (since that’s exactly what’s happening behind the scenes):

Overloading public class Test<K, V> { public void f(K k) { }

  public void f(V v) {
  }} T.java:2: name clash: f(K) and f(V) have the same erasure public void f(K k) { ^ T.java:5: name clash: f(V) and f(K) have the same erasure public void f(V v) {

The workaround here is simple: rename your methods. Introspection public class Test { public <T> void f() { Object t; if (t instanceof List<T>) { ... } } } Test.java:6: illegal generic type for instanceof if (t instanceof List<T>) {}

There is no easy workaround for this limitation, you will probably want to be more specific about the generic type (e.g. adding an upper bound) or ask yourself if you really need to know the generic type T or if the knowledge that t is an object of type List is sufficient. Instantiation public class Test { public <T> void f() { T t = new T(); } } view source print ? Test.java:3: unexpected type found : type parameter T required: class T t = new T();

"

"

Generics is a complicated language feature. It becomes even more complicated when added to an existing language that already has subtyping. These two features don’t play very well together in the general case, and great care has to be taken when adding them to a language. Adding them to a virtual machine is simple if that machine only has to serve one language – and that language uses the same generics. But generics isn’t done. It isn’t completely understood how to handle correctly and new breakthroughs are happening (Scala is a good example of this). At this point, generics can’t be considered “done right”. There isn’t only one type of generics – they vary in implementation strategies, feature and corner cases.

What this all means is that if you want to add reified generics to the JVM, you should be very certain that that implementation can encompass both all static languages that want to do innovation in their own version of generics, and all dynamic languages that want to create a good implementation and a nice interfacing facility with Java libraries. Because if you add reified generics that doesn’t fulfill these criteria, you will stifle innovation and make it that much harder to use the JVM as a multi language VM.

I’m increasingly coming to the conclusion that multi language VM’s benefit from being as dynamic as possible. Runtime properties can be extracted to get performance, while static properties can be used to prove interesting things about the static pieces of the language.

Just let generics be a compile time feature. If you don’t there are two alternatives – you are an egoist that only care about the needs of your own language, or you think you have a generic type system that can express all other generic type systems. I know which one I think is more likely. '

" value types would be awesome. F# is a good example of such a language (scala and clojure aren't)

jdmichal 664 days ago [-]

> a compromise would perhaps be to allow at least reified primitive generics so that List<int> would be a proper runtime type whereas List<Object> would be the runtime version of all reference types in List<T>.

This is actually pretty much exactly what the .NET runtime does. All value types get a separately-JIT'd version of the type, while all reference types share the same Object-based version. See section on implementation here:

http://msdn.microsoft.com/en-us/library/ms379564(v=vs.80).as...

pron 664 days ago [-]

The work is already underway in Project Valhalla: http://openjdk.java.net/projects/valhalla/

Once it's been decided Java should get value types, generic reification became a more urgent necessity. "

"

alkonaut 664 days ago [-]

Mutable structs in c# are rare, rarely useful and often dangerous, but in some perf critical scenarios that hack is needed. Most notably the standard List<T>.Enumerator is a mutable struct because othrrwise an object would have to be created on the heap for the sole purpose of iterating a list.

Not sure whether this was actually one of the reasons for allowing mutable structs to begin with, the use case is critical enough that it might well have been. "

"Java uses the notion of type erasure to implement generics. In short the underlying compiled classes are not actually generic. They compile down to Object and casts. In effect Java generics are a compile time artifact and can easily be subverted at runtime.

C# on the other hand, by virtue of the CLR, implement generics all they way down to the byte code. The CLR took several breaking changes in order to support generics in 2.0. The benefits are performance improvements, deep type safety verification and reflection. "

Java: " When a generic type is instantiated, the compiler translates those types by a technique called type erasure — a process where the compiler removes all information related to type parameters and type arguments within a class or method. Type erasure enables Java applications that use generics to maintain binary compatibility with Java libraries and applications that were created before generics. "

C# " This design choice is leveraged to provide additional functionality, such as allowing reflection with preservation of generic types, as well as alleviating some of the limitations of erasure (such as being unable to create generic arrays). This also means that there is no performance hit from runtime casts and normally expensive boxing conversions. "

" Something that could be seen as a disadvantage of reifiable types (at least in C#) is the fact that they cause code explosion. For instance List<int> is one class, and a List<double> is another totally different, as it is a List<string> and a List<MyType?>. So classes have to be defined at runtime, causing an explosion of classes and consuming valuable resources while they are being generated. "

sounds like most everyone is saying that C#'s non-type-erasure/'reified-generics' system is better and that Java only did type erasure for backwards compatibility (so that programs with the new generics would still run on older JVMs). 'Reified generics' seem to mean that the VM is aware of the types somehow.

but:

"

Another advantage of erased generics, is that different languages that compile to the JVM employ different strategies for generics, e.g. Scala's definition-site versus Java's use-site covariance. Also Scala's higher-kinded generics would have been more difficult to support on .Net's reified types, because essentially Scala on .Net ignored C#'s incompatible reification format. If we had reified generics in the JVM, most likely those reified generics wouldn't be suitable for the features we really like about Scala, and we'd be stuck with something suboptimal. Quoting from Ola Bini's blog,

    What this all means is that if you want to add reified generics to the JVM, you should be very certain that that implementation can encompass both all static languages that want to do innovation in their own version of generics, and all dynamic languages that want to create a good implementation and a nice interfacing facility with Java libraries. Because if you add reified generics that doesn’t fulfill these criteria, you will stifle innovation and make it that much harder to use the JVM as a multi language VM.

Personally I don't consider the necessity of using a TypeTag? context bound in Scala for conflicting overloaded methods as a disadvantage, because it moves the overhead and inflexibility of reification from a global (whole program and all possible languages) to a use-site per language issue that is only the case less frequently.

"

-- http://programmers.stackexchange.com/a/212038/167249

"Writing programs in the presence of erasure *forces* you to avoid excessive coupling to runtime type knowledge, which is required if you actually want to write reusable code. " -- https://groups.google.com/forum/#!msg/scala-language/PV4q6O1qIh8/cy0vVFTMdr4J


ok so we probably want some sort of I/O in our 'primitive instructions'.

(1) because let's say you are writing a new Oot Bytecode implementation right now, the first thing you want to do is NOP, the next is to add 1+1. But after adding 1+1 you, the implementor, want to see if the result was right. So you can printout a dump of memory at the end of the program, but pretty soon you're going to want a PRINT command, or at least a LOG command.

2) In addition the idea was that a naive implementation would implement ONLY the primitives, but how are you going to have an Oot compiler or interpreter with no way to read the source code files or to write out object files? Again, you could just have your implementation have magic memory locations that it reads and writes to, but how will the standard Oot implementation know which those are? They have to be standardized, at which point you may as well have I/O instructions.

3) My belief is that the Turing machine abstraction falls a little short and what computers really are today are INTERACTIVE Turing-like machines. So we should put this interactivity into our primitive computational model.

4) We really want a LOG in there pretty early too.

so ppl say IOCP >? kqueue >> epoll >> poll > select. So should we implement something like IOCP or kqueue?

ppl say Erlang has better async I/O than Node [1]