proj-oot-ootInteropNotes1

re: go: "In addition, interacting with popular libraries (such as libsdl or even OpenGL?) that use thread-local variables (TLS) means using ugly workarounds like this one:

http://code.google.com/p/go-wiki/wiki/LockOSThread "

" Some libraries, especially graphical frameworks/libraries like Cocoa, OpenGL?, libSDL all require it's called from the main OS thread or called from the same OS thread due to its use of thread local data structures. Go's runtime provides LockOSThread?() function for this, but it's notoriously difficult to use correctly. " -- see http://code.google.com/p/go-wiki/wiki/LockOSThread for solution code

--

random blog post, havent skimmed:

http://www.knewton.com/tech/blog/2012/10/java-scala-interoperability/

--

--

for interop with low-level stuff:

note that fixed-length fields within a record can be represented by an annotation overlay (node labels of a certain type/a certain label-label) on top of a an array of 'byte's

--

should make sure that

(a) calling Java method, or calling a Oot method from Java, are concise (b) passing/converting basic oot data into a Java list, or passing/converting a Java lisp into a oot list, is concise

--

ability to "do things like "mmap this file and return me an array of ComplicatedObject?[]" instances" (from https://news.ycombinator.com/item?id=6425412)

--

" If you want to call Scala from Java and have it look nice your Scala can't rely on Scala-specific language features such as implicit conversions, implicit arguments, default arguments, symbolic method names, by-name parameters, etc. Generics should be kept relatively simple as well. If you're Scala objects are relatively simple then there shouldn't be any problem using them from Java. – Erik Engbrecht May 21 '11 at 13:09' "

--

need to support some sort of GOTO in order to allow fast interpreters without dropping to assembly (because you need to help the hardware's branch predictor; the structured programming transformation in which there is a single 'switch' that maps between basic blocks means that the hardware's branch predictor tries to predict just by something similar to measuring the frequency distribution of landing on the targets to that switch, which of course is all over the place; what you want is to have many branch/goto instructions at various places in the code, some of which have non-uniform patterns in the frequency distribution of their targets, so that the branch predictor can get the distribution for each one separately and speculatively branch while you are in the code leading up to the goto; apparently this branch misprediction accounts for a large proportion of the slowdown between actual assembly and software-emulated assembly written in higher-levels structured-programming languages like C, according to the following blog post)

http://www.emulators.com/docs/nx25_nostradamus.htm

" the writing is now on the wall pointing the way toward simpler, scaled down CPU cores that bring back decades olds concepts such as in-order execution, and the use of binary translation to offload complex instructions from hardware into software.

For this anniversary posting, I am going to tackle the one giant gaping hole of my argument that I haven't touched on so far. Over the past few months I've demonstrated how straightforward it is to implement many aspects of a virtual machine in a portable and efficient manner - how to simulate guest conditional flags without explicit use of the host processor's flags register, how to handle byte-swapping and endianness differences without an explicit byte-swapping instruction on the host, how to perform safe guest-to-host memory access translation and security checks without need for a host MMU or hardware virtualization hardware, and how to optimize away most of the branch mispredictions of a typical CPU intepreter such as to achieve purely-interpreted simulation speed levels of 100 guest MIPS or faster.

But as I hinted at last week, there is still one area I haven't explored with you, and that is the crucial indirection at the heart of any interpreter loop; the indirect call or indirect jump which directs the interpreter to the next guest instruction's handler. An indirection that by design is doomed to always mispredict and thus severely limit the maximum speed of any interpreter. As the data that Stanislav and I presented at ISCA shows, the speed ratio between purely interpreted Bochs and purely jitted QEMU is almost exactly due to the extra cost of a branch misprediction on every guest x86 instruction simulated. Eliminate or reduce the rate of that branch misprediction, and you can almost close the performance gap between an interpreter and a jitter, and thus debunk one of the greatest virtual machine myths of all - the blind faith in the use of jitting as a performance accelerator.

I will first show why this misprediction happens and why today's C/C++ compilers and microprocessors are missing one very obvious optimization that could make this misprediction go away. Then I will show you the evolution of what I call the Nostradamus Distributor, an interpreter dispatch mechanism that reduces most of the mispredictions of the inner CPU loop by helping the host CPU predict the address of the next guest instruction's handler. A form of this mechanism was already implemented in the Gemulator 9 Beta 4 release posted a few months ago, and what I will describe is the more general C/C++ based portable implementation that I plan to test out in Bochs and use in the portable C implementation of Gemulator.

...

The Common CPU Interpreter Loop Revisited

I introduced you to a basic CPU interpreter loop last October in Part 7, which I will now bore you with again for the last time:

void Emulate6502() { register short unsigned int PC, SP, addr; register unsigned char A, X, Y, P; unsigned char memory[65536];

    memset(memory, 0, 65536);
    load_rom();
    /* set initial power-on values */
    A = X = Y = P = 0;
    SP = 0x1FF;
    PC = peekw(0xFFFC);
    for(;;)
        {
        switch(peekb(PC++))
            {
        default: /* undefined opcode! treat as nop */
        case opNop:
            break;
        case opIncX:
            X++;
            break;
        case opLdaAbs16:
            addr = peekw(PC);
            PC += 2;
            A = peekb(addr);
            break;

... } } }

This is a hypothetical piece of sample code that could be used as a template to implement at 6502 interpreter.

...

What I have never seen a C or C++ compiler do for such interpreter loop code is make one further optimization. What is if instead of generating the jump instruction to the top of the "for" loop the compiler was smart enough to simply compile in the fetch and dispatch into each handler? In other words, if there was some kind of funky "goto" keyword syntax that would allow you to write this in your source code to hint to the compiler to do that:

        switch(peekb(PC++))
            {
        default: /* undefined opcode! treat as nop */
        case opNop:
            goto case(peekb(PC++));
        case opIncX:
            X++;
            goto case(peekb(PC++));
        case opLdaAbs16:
            addr = peekw(PC);
            PC += 2;
            A = peekb(addr);
            goto case(peekb(PC++));

You would now have an interpreter loop that simply jumped from instruction handler to instruction handler without even looping. Unless I missed something obvious, C and C++ lack the syntax to specify this design pattern, and the optimizing compilers don't catch it. It is for this reason that for all of my virtual machine projects over the past 20+ years I have resorted to using assembly language to implement CPU interpreters. Because in assembly language you can have a calculated jump target that you branch to from the end of each handler, as this x86 example code which represents the typical instruction dispatch code used in most past versions of Gemulator and SoftMac? and Fusion PC:

    movzx ebx,word ptr gs:[esi]
    add esi,2
    jmp fs:[ebx*4]

The 68040 guest program counter is in the ESI register, the 68040 opcode is loaded into EBX, the 68040 program counter is then incremented, and then the opcode is dispatched using an indirect jump. In Fusion PC the GS and FS segment registers are used to point to guest RAM and the dispatch table respectively, while in Gemulator and SoftMac? there were explicit 32-bit address displacements used. But the mechanism is the same and you will see this pattern in many other interpreters.

The nice thing about handler chaining is that it has a beneficial side-effect! Not only does it eliminate a jump back to the top of a loop, by spreading out the indirect jumps from one central point and into each of the handlers the host CPU how has dozens if not hundreds of places that is it dispatch from. You might say to yourself this is bad, I mean, this bloats the size of the interpreter's code and puts an extra strain on the host CPU's branch predictor, no?

Yes! But, here is the catch. Machine language opcodes tend to follow patterns. Stack pushes are usually followed by a call instruction. Pops are usually followed by a return instruction. A memory load instruction is usually followed by a memory store instruction. A compare is followed by a conditional jump (usually a Jump If Zero). Especially with compiled code, you will see patterns of instructions repeating over and over again. That means that if you are executing the handler for the compare instruction, chances are very good that they next guest instruction is a conditional jump. Patterns like this will no doubt make up a huge portion of the guest code being interpreted, and so what happens is that the host CPU's branch predictor will start to correctly predict the jump targets from one handler to another.

...

gcc's computed goto:

http://web.archive.org/web/20100130194117/http://blogs.sun.com/nike/entry/fast_interpreter_using_gcc_s

"

--

incidentially the above blog is recommended by another random blog:

" vx32, an unconventional (but not exactly new) approach to virtualization: segmentation registers and instruction translation. An excellent introduction to this is provided by No Execute , and if you are interested in creating or working with virtual machines (from interpreters to emulators), I can’t recommend No Execute enough. " http://www.emulators.com/nx_toc.htm

--

" The Inferno shell is one of the most interesting features, however. It’s based on the rc shell by none other than Tom Duff of Duff’s Device fame. rc is a great shell, especially for scripting, runs almost everywhere, and is the default shell for Plan 9. Inferno’s version introduces an FFI to Limbo. Re-read that sentence if your jaw and the floor haven’t connected yet.

Through the Inferno shell builtin “load”, you can load modules that expand the shell’s builtins. For example, Inferno comes with a module to add regex support to the shell, one to add CSV support, and another to add support for Tk . I’ve not seen a feature quite like this in a shell, although I mostly stick to bash or rc, not having tried out the slightly more exotic ksh or zsh shells, which for all I know also have that feature. "

--

" Structures in Go are laid out in memory identically to the same in C, allowing for potential zero copy use on either side (made possible given that addresses of data are fixed in Go — it is not a compacting GC, and doesn’t need the normal pinning functionality or expensive boundary crossing often seen in GC VMs like Java or .NET). There are concerns regarding who frees memory, and the lifetime of passed objects that *are* bound to be GCs, but those are topics for another time. "

example: http://dennisforbes.ca/index.php/2013/07/31/demonstrating-gos-easy-c-interop/

http://golang.org/cmd/cgo/

pointers are available however no pointer arithmetic: http://golang.org/doc/faq#no_pointer_arithmetic --

" Actually, the hardest problem was getting the instrumentation agent to identify suspendable Clojure functions. This is quite easy with Java Quasar code as suspendable methods declare themselves as throwing a special checked exception. The Java compiler then helps ensure that any method calling a suspendable method must itself be declared suspendable. But Clojure doesn’t have checked exceptions. I though of using an annotation, but that didn’t work, and skimming through the Clojure compiler’s code proved that it’s not supported (though this feature could be added to the compiler very easily). In fact, it turns out you can’t mark the class generated by the Clojure compiler for each plain Clojure function in any sensible way that could be then detected by the instrumentation agent. Then I realized it wouldn’t have mattered because Clojure sometimes generates more than one class per function.

I ended up on notifying the instrumentation agent after the function’s class has been defined, and then retransforming the class bytecode in memory. Also, because all Clojure function calls are done via an interface (IFn), there is no easy way to recognize calls to suspendable functions in order to inject stack management code at the call-site. An easy solution was just to assume that any call to a Clojure function from within a suspendable function is a call to a suspendable function (although it adversely affects performance; we might come up with a better solution in future releases). "

---

this clojure library looks like the sort of thing i was thinking about! :

https://github.com/ztellman/gloss

---

" Bitfields would be really handy when writing a device driver. A basic example of the difference they would make is "if (reg.field == VAL)" vs. either "if (reg & MASK == VAL)" or "if (GET_FIELD(reg) == MASK)". But you can't use them for that purpose because their layout is implementation defined.

There would be a noteworthy performance benefit to pay for portable bitfields, but I think it would be worth it. Right now I have to write ugly code if I want any attempt at portability (always). I'd much rather have to write ugly code where I need speed (sometimes).

I'd love to hear if anyone else has any ideas on this matter (BTW it seems like D solves many of the problems I've been thinking about, but it's memory managed). "

---

http://pyjnius.readthedocs.org/en/latest/

---

go's C interop

"Go has a foreign function interface to C, but it receives only a cursory note on the home page. This is unfortunate, because the FFI works pretty darn well. You pass a C header to the "cgo" tool, and it generates Go code (types, functions, etc.) that reflects the C code (but only the code that's actually referenced). C constants get reflected into Go constants, and the generated Go functions are stubby and just call into the C functions.

The cgo tool failed to parse my system's ncurses headers, but it worked quite well for a different C library I tried, successfully exposing enums, variables, and functions. Impressive stuff.

Where it falls down is function pointers: it is difficult to use a C library that expects you to pass it a function pointer. I struggled with this for an entire afternoon before giving up. Ostsol got it to work through, by his own description, three levels of indirection. " -- http://ridiculousfish.com/blog/posts/go_bloviations.html#go_ccompatibility

--

" Instead, the OCaml created a pair of pipes and spawned a Python subprocess. You need to use two pipes, not a single socket, because Windows doesn't support Unix sockets. "

---

this thread talks about Traceur being better than Babeljs for Javascript ES6 -> ES5 compilation (which ppl call "transpilation" for some odd reason)

https://news.ycombinator.com/item?id=9090958

"issues are addressed within a day and the author is an overall great guy"

"Traceur isn't readable at all (at least not to me) which might not matter, but I think in some cases, Babel's output is closer to more traditional Javascript and more performant"

"Traceur scared people away both because it was a build tool and had a runtime dependency. Now that tools like Webpack are common, introducing a build step at ay point in your workflow is trivial. Moreover, many of the JS Harmony improvements can be transpiled with a tool like Babel or jstransform without the need to introduce yet another library in your deployed code."

"

Bahamut 2 days ago

I like Traceur (at least with tools like System.js & jspm), especially with its support for optional typing and annotations, but for some apps, Babel makes a lot more sense.

I came across the difficulty of using jspm on the server and client for an isomorphic React app I am building for an online community I run, and I was advised to just use npm (& naturally Babel). Integrating browserify with babelify into the build process was a far easier task."

--

look at all the trouble i go thru with image_singlegenes. It should be easy to make an array of images in Python and then transfer them into the Octave code being run. A good glue language would handle that sort of thing.

that is, the stuff in eg imsave_via_octave should be almost builtin (contentful 'pipes' at least, but really, stacks and named variables should transfer, even mutable ones/references), and it should be easy to write stuff like image_singlegenes by calling it

--

" mooreds 17 hours ago

I haven't done much with the other JVM languages except play around with them (did work with a small jython project about 10 years ago...). Have you encountered any impedance mismatch? Or weird cross language bugs?

reply

infraruby 15 hours ago

JRuby converts values (Java primitives <-> ruby Numeric, java.lang.String <-> ruby String, etc.) sometimes with unexpected results.

JRuby does not wrap primitive values, or provide values that behave like primitives, but you can add that: https://rubygems.org/gems/infraruby-java

reply "

---

https://groups.google.com/forum/m/#!topic/golang-nuts/RwJaZh0nJA4 notes that the gold standard of embedding, "embedding "blindfolded" Go code in C, by compiling Go to object code, and then linking to that from C," won't work in Golang, because

" The biggest problem is the fact that Go has its own runtime, namely a garbage collector and a scheduler (for go routines) and segmented stacks that do not play well with C stacks. It's much more complicated than linking the executable code and translating between calling conventions (e.g. what happens to a goroutine created in Go code you called from C after Go code returns to C?)

Now, it's not impossible to overcome this e.g. V8 or Microsoft's .NET have embedding APIs that allow for decent way of calling JavaScript?/CLR code from C/C++ code, but that doesn't exist for any current implementation of Go (as far as I know) and doing it right is significant amount of work.

"

---

https://news.ycombinator.com/item?id=9376793

---

http://ebb.org/bkuhn/blog/2014/06/09/do-not-need-cla.html

---

apl 16 hours ago

The Julia FFI for Python is absolutely excellent -- calling a particular Python library from Julia takes very, very little effort. As a language for scientific programming, Julia is way ahead of Python. So I'm not sure if this particular argument holds much water.

https://github.com/stevengj/PyCall.jl

reply

---

weak references, and a way to receive a notification sometime after the garbage-collection of a weak reference, is rather necessary for interop: see https://news.ycombinator.com/item?id=9735973


munificent 5 hours ago

WebAssembly? still has a long way to go. They don't have a plan yet for:

I don't think any high level language will be able to compete with JS on an even playing field until that language can use the high performance GC that's already in every browser.

If your language has to either not use GC (huge productivity loss) or ship your own GC to the end user with your app (huge app size), it's an unfair fight.

reply

---

webassembly

---

spullara 4 hours ago

It is really too bad that at some point in the last 18 years of Java VMs being in browsers that they didn't formalize the connection between the DOM and Java so that you could write code that interacted directly with the DOM and vice/versa in a mature VM that was already included. Would have been way better than applets, way faster than Javascript and relatively easy to implement. The browsers actually have (had?) APIs for this but they were never really stabilized.

reply

hello_there 3 hours ago

I find it interesting that Java didn't become the standard for this as it seems like it has everything and is both fast and mature.

What might be the reason?

reply

titzer 1 hour ago

There are several important lessons to learn from the Java bytecode format and members of the WebAssembly? (including myself) do have experience here. In particular, JVM class files would be a poor fit for WebAssembly? because:

1. They impose Java's class and primitive type model. 2. They allow irreducible control flow. 3. They aren't very compact. Lots of redundancy in constant pools across classes and still a lot of possibility for compression. 4. Verification of JVM class files is an expensive operation requiring control and dataflow analysis (see stackmaps added in the Java 7 class file format for rationale). 5. No notion of low-level memory access. WebAssembly? specifically addresses this, exposing the notion of a native heap that can be bit-banged directly by applications.

reply

BrendanEich? 2 hours ago

See https://news.ycombinator.com/item?id=1894374 from @nix.

reagency 2 hours ago

Back when Java Applets were a thing, Sun wasn't friendly with browser makers. JavaScript? was a gimmicky alternative that was created by a browser manufacturer. It had the foothold, and it grew.

Nos Oracle isn't interested in Web.

reply

nix 1679 days ago

parent flag

My admittedly biased view: I spent two years of my life trying to make the JVM communicate gracefully with Javascript - there were plenty of us at Netscape who thought that bytecode was a better foundation for mobile code. But Sun made it very difficult, building their complete bloated software stack from scratch. They didn't want Java to cooperate with anything else, let alone make it embeddable into another piece of software. They wrote their string handling code in an interpreted language rather than taint themselves with C! As far as I can tell, Sun viewed Netscape - Java's only significant customer at the time - as a mere vector for their Windows replacement fantasies. Anybody who actually tried to use Java would just have to suffer.

Meanwhile Brendan was doing the work of ten engineers and three customer support people, and paying attention to things that mattered to web authors, like mixing JS code into HTML, instant loading, integration with the rest of the browser, and working with other browser vendors to make JS an open standard.

So now JS is the x86 assembler of the web - not as pretty as it might be, but it gets the job done (GWT is the most hilarious case in point). It would be a classic case of worse is better except that Java only looked better from the bottom up. Meanwhile JS turned out to be pretty awesome. Good luck trying to displace it.

SWF was the other interesting bytecode contender, but I don't know much about the history there. Microsoft's x86 virtualization tech was also pretty cool but they couldn't make it stick alone.

---

LuaJIT?2's FFI is recommended by https://news.ycombinator.com/item?id=9762678 (who also likes C# and Swift's) and https://news.ycombinator.com/item?id=9763555

https://news.ycombinator.com/item?id=9763611 likes Python's new CFFI too, and thinks it is "basically a port of luajit's ffi", and also likes http://ruby-doc.org/stdlib-2.0.0/libdoc/fiddle/rdoc/Fiddle.html (which e says is "ruby DSL to libffi"

---

spot 23 hours ago

totally agree and that's why we made Beaker: http://beakernotebook.com/

you can code in multiple languages in your noteobook, and they can all communicate, making it easy to go from Python to R to JavaScript?, seamlessly.

we just released v1.4 with all kind of new features, check it out: https://github.com/twosigma/beaker-notebook/releases/tag/1.4...

---

stared 203 days ago

Speaking as a Pythonist (but who is in love with ggplot2 and dplyr) the wonderful thing about IPython Notebook is that its possible to inline R code with no more fuss than adding "%%R" in a cell.:

http://nbviewer.ipython.org/github/davidrpugh/cookbook-code/...

BTW: For pandas-dplyr dictionary: http://nbviewer.ipython.org/gist/TomAugspurger/6e052140eaa5f...

---

rayiner 39 minutes ago

Small thing, but nice to see that the API is exposed as C (easy to write bindings for) instead of C++ (near-impossible to write bindings for).

reply

xenadu02 29 minutes ago

Slight derail but I find it interesting that the Swift team is tackling the fragile ABI problem with v3... Something C++ could have done at any time, enabling portable and interoperable C++ interfaces.

(There's no reason clang importer couldn't surface C++ or any other language to Swift for that matter)

reply

saurik 10 minutes ago

Yeah. As much as I absolutely love C++11, and use its features constantly, if I had to make the hard call: "do I get lambdas, or do I get some extern ABI marker I can use to opt-in to a non-fragile ABI, with field and vtable offsets linked as symbols and return types mangled into function names", I would have made the (incredibly depressing) choice for the latter, as it makes C++ suddenly able to be used in ways that are otherwise insane to contemplate.

reply

angersock 22 minutes ago

Quite right! The most annoying thing about otherwise good C++ libs is that pesky fact. SWIG doesn't quite cut it for me.

reply

---

" Go is garbage collected, can C and Go share memory?

In short:

    Go can pass a pointer to C
    the referenced memory can't have pointers to Go allocated memory
    C can't keep pointers to the memory after the call returns

In more detail:

    the cgo docs.

This is checked by the runtime at execution.

You could disable the checks, but you probably shouldn't.

package main

/* int fn(void* arg) { return arg == 0; }

type T struct{ a, b int } type X struct{ t *T }

func main() { t := T{a: 1, b: 2} C.fn(unsafe.Pointer(&t)) correct

    x := X{t: &t}
    C.fn(unsafe.Pointer(&x)) // incorrect}
 Outputs:

panic: runtime error: cgo argument has Go pointer to Go pointer

" -- https://talks.golang.org/2016/state-of-go.slide#44

---

" Any decent language which is implemented regarding performance uses a c-style ABI and the HLL follows that. Even in high-level languages with much more advanced features, such as common lisp or scheme, with features similar to parrot. Look at Go for example.

Not the other way round as in parrot where the HLL dictates the ABI and c callouts and callbacks are slow, or worst of all perl5, where there's no c stack, where the args and lexicals are in an artificial array on the heap. c callouts should be fast, and should not require extensive protection or locks, otherwise you have limit yourself to your language in your standard library. You have to re-implement everything from scratch. "

---

http://www.dyncall.org/

---

this links talk about how 'nested runloops' are a problem. I think they mean calling foreign code which then calls back into your code.

https://github.com/MoarVM/MoarVM/blob/master/docs/interpreter.markdown

http://perl6.niner.name/parrot/getting_rid_of_nested_runloops/index.html

i havent read this yet: http://whiteknight.github.io/2011/04/30/vision_parrot_concurrency_3.html

nor this: http://niner.name/Hybrid_Threads_for_the_Parrot_VM.pdf ---

" Low-level primitives.

    @struct class Vec(
      val x: Double,
      val y: Double,
      val z: Double
    )
    val vec = stackalloc[Vec] // pointer to stack allocation
    !vec = new Vec(1, 2, 3)   // store value to stack
    length(vec)               // pass by reference
    Pointers, structs, you name it. Low-level primitives let you hand-tune your application to make it work exactly as you want it to. You're in control.
    Extern objects.
    @extern object stdlib {
      def malloc(size: CSize): Ptr[_] = extern
    }
    val ptr = stdlib.malloc(32)
    Calling C code has never been easier. With the help of extern objects you can seamlessly call native code without any runtime overhead.
    Instant startup.
    > time hello-native
    hello, native!
    real    0m0.005s
    user    0m0.002s
    sys     0m0.002s
    Scala Native is compiled ahead-of-time via LLVM. This means that there is no sluggish warm-up phase that's common for just-in-time compilers. Your code is immediately fast and ready for action."

---

things relating to interop between scala-native, scalajs, and scala, and things that make them hard:

"

-- https://news.ycombinator.com/item?id=11678545 thread

airless_bar 19 hours ago

Scala macros are compile-time. The Scala reflection library (that works at runtime) share most API with macros.

Macros should always work, reflection is harder as some information needs to be retained at runtime (either via Java reflection or additional data – which can both be problematic in Scala.js).

reply

more on scala-native:

densh 18 hours ago

The goal is to be as "true" as possible by default with extra flags to trade some exact semantic aspects for performance. E.g. overflow semantics, bounds checks, null safety etc.

Apart from the base language we'll also introduce some language/library extensions to make lower-level programming that is going to be limited Scala Native "dialect" of Scala. E.g. pointers, structs, stack allocation, extern objects etc.

reply

airless_bar 18 hours ago

Is there a difference between @extern and @native?

I could have imagined that it would have been more compatible if Scala-JVM and Scala-Native used the same annotations (using JNI behind the scenes on the JVM).

reply

densh 15 minutes ago

@native implies JNI-style definition to be available along the Scala method definition. @extern lets you call straight to C code without any additional ceremony, all you need is a single forward declaration.

reply

sjrd 15 hours ago

I would explain the difference as: @native says please implement this Scala method of this Scala class in C, respecting all the JVM calling conventions and memory model. @extern says call this C function that never knew about Scala or the JVM, respecting all the C calling conventions and memory model.

reply

---

denfromufa 12 hours ago [-]

Maybe because Jupyter has already more than 70 kernels and got adopted before other notebooks appeared (Beaker, Spark, Zeppelin)?

reply

spot 2 hours ago [-]

true but the languages are siloed, each notebook runs just one language. with beaker the languages can communicate with each other. there's no easier way to combine python and javascript for d3, for example: https://pub.beakernotebook.com/publications/7fdcaaa6-fb83-11...

there are lot more differences in the UI as well.

reply

nl 1 hour ago [-]

You can actually use R and Python in the same notebook. See https://blog.dominodatalab.com/lesser-known-ways-of-using-no...

The %Rpush and %Rpull magics are what you need.

Also, if you are using Spark, then Apache Toree[1] lets you use Python, R, Scala and SQL in the same notebook against Spark[2].

[1] https://toree.incubator.apache.org/documentation/user/how-it...

[2] https://github.com/ibm-et/spark-kernel/wiki/Language-Support...

reply

denfromufa 1 hour ago [-]

Not everyone needs more than one language in the same notebook communicating between each other. But if required, then the cell magic system looks superior to me: have a look at %%fortran, %%cython, %%javascript, %%html, %%bash options. Also it is possible to switch kernels in the same notebook, but serialization of state between kernels is handled by user.

reply

---

" The boundary between languages is always problematic. And the reason for that is because you choose memory layout to make certain trade-offs around the access patterns for that data. Do you inline data structures in a vector, or chase pointers? Do objects carry their type information with them so you can manipulate them dynamically, or do they throw it away at compile-time for greater efficiency? Do you get O(1) string indexing or native UTF-8? Do you allocate on the stack or the heap? Do you copy, borrow, move, or COW? "

---

truffle interop:

trishume 9 hours ago [-]

That's exactly how Truffle's cross-language-interface works though. Instead of paying the high cost of conversion, data stays in its existing representation and it is the interface code that is recompiled to fit.

An example is JRuby+Truffle where in its C extensions, pointers to Ruby objects act to the C code as if they are pointers to MRI data structures, but behind the scenes Graal compiles code that accesses them like the JRuby objects that they are.

...

It works without common memory layout between Truffle languages. It even simplifies the ability to use diverse physical memory layouts within the same language. The programmer specifies the logical layout based on the semantics of a language. The runtime decides how to map this logical layout onto the physical hardware. It can take into account the trade-offs the programmer decided to choose.

reply

nostrademons 8 hours ago [-]

Interesting. So, if I'm understanding this correctly - Truffle allows a language designer to specify a certain logical layout for data structures, along with hints for how this will get converted to a physical memory layout. Language implementors also get the full power of Graal for lowering their AST to machine code and other compiler tasks. On cross-language boundaries, it generates automatic accessors for other languages to access that data, using the logical layout to identify how particular fields need to be pulled out and manipulated, but not requiring that the full data structure be converted across the foreign-call boundary. One consequence of this is that nesting & embedding of data structures may require an explicit conversion, since if an object is a hashtable in Javascript but a packed series of fields in C, it's obviously not going to fit.

Sorta like SWIG++? If you could do SWIG but never require that an end-user write a typemap or debug a crash themself, there'd probably be a big market for that.

reply

---

Cacti 10 hours ago [-]

We should be clear here by what we mean by language.

It's perfectly possible and manageable to create cross-language VMs and to translate pretty accurately between languages. However, what breaks down, as you alluded to, is that people expect the standard libraries, and the libraries of others, to work interchangeably, and that is orders of magnitudes more complex (mostly because you are, by definition, working closer to the metal).

So, I need to ask, how much research is being done in giving the standard library as much consideration as the language? Do we have a grasp on what it would take to treat the stdlib as a "first class" citizen?

reply

---

mb check out boost::python http://www.boost.org/doc/libs/1_61_0/libs/python/doc/html/index.html

---

the java code sample in

http://psy-lob-saw.blogspot.com/2013/04/lock-free-ipc-queue.html

---

"cgo is slow

I’d heard this, but I got a chance to observe it first-hand. " [1]

discussion:

sorted by: best

[–]f2u 6 points 2 years ago

Why are cgo calls so slow? Do they perform a stack switch because C code expects more stack space?

    permalinkembedsavegive gold

[–]Rhomboid 5 points 2 years ago

In this message the reason cited is the necessity of doing a save/restore of all registers -- I assume that's what's meant by a "full register set switch". That would seem to be a logical consequence of Go not using the standard platform ABI but its own weirdo thing. I'm assuming that the stack switch to a larger C stack from the small segmented Go stack is also part of that procedure.

    permalinkembedsaveparentgive gold

[–]monocasa 3 points 2 years ago

Well, if you swap out all registers, that's implicitly a stack swap.

    permalinkembedsaveparentgive gold

[–]strncat 1 point 2 years ago

It needs to switch to a separate large C stack, and this makes the CPU very unhappy. This was half of the reason for Rust dropping segmenting stacks, with the other being the pain of thrashing on a segment boundary. Go has a precise garbage collector with no unmanaged pointers, so there are other options they could pursue in the future.

Go does system calls itself on Linux to avoid the overhead of calling through the POSIX wrapper functions, so it's not usually a problem for the networking niche it was designed around.

    permalinkembedsaveparentgive gold

---

http://bloomberg.github.io/bucklescript/blog/index.html

---

(in reply to a comment regarding schemas vs. 'self-describing' data formats, eg those which include the types of things within the data being passed:)

aidenn0 14 hours ago [-]

Schemas are IMO the only way to go when you are transferring between heterogenous languages, even when all languages involved are untyped.

Consider javascript talking to common lisp. Of course JSON has a canonical mapping to javascript, but it does not for common lisp. Should a JS array be a lisp list or vector? Should lisp's NIL be false or null? Should a JS object decode to an alist, plist, or hash-table? &ct.

reply

---

python rust interop

https://blog.sentry.io/2016/10/19/fixing-python-performance-with-rust.html

https://news.ycombinator.com/item?id=12748020

---

in Erlang/Elixir/BEAM:

sasa555 2 days ago [-]

It is possible to start external processes from BEAM and interact with them. I've blogged a bit about it at http://theerlangelist.com/article/outside_elixir

You can also write NIFs (native implemented functions) which run in BEAM process (see http://andrealeopardi.com/posts/using-c-from-elixir-with-nif...). The latter option should be the last resort though, because it can violate safety guarantees of BEAM, in particular fault-tolerance and fair scheduling.

So using BEAM facing language as a "controller plane" while resorting to other languages in special cases is definitely a viable option.

reply

Callmenorm 2 days ago [-]

I spent 30 minutes looking at NIF, but I was scared away. My understanding is that if the NIF crashes then BEAM crashes. Which leads me to think that if you need NIF then you need safety guarantees on the Native side that C can't provide.

reply

sasa555 2 days ago [-]

Precisely, which is why I always advise to consider ports first :-)

However, in some situations the overhead of communicating with a port might be too large, so then you have two options:

  1. Move more code to another language which you run as a port.
  2. Use a NIF

It's hard to generalize, but I'd likely consider option 1 first.

If you go for a NIF, you can try to keep its code as simple as possible which should reduce the chances of crashing. You can also consider extracting out the minimum BEAM part which uses the NIF into a separate BEAM node which runs on the same machine. That will reduce the failure surface if the NIF crashes.

I've also seen people implementing NIFs in Rust for better safety, so that's another option to consider.

So there are a lot of options, but as I said, NIF would usually be my last choice precisely for the reason you mention :-)

reply

derefr 2 days ago [-]

Think of NIFs as Erlang's equivalent to Rust's unsafe{} blocks. It's where you write the implementations of library functions that make system calls, and the like. But, like unsafe{} blocks, you do as little as possible within them.

For example, if you want to call some C API from Erlang where the C API takes a struct and returns a struct, you'll want to actually populate the request struct--and parse the return struct--on the Erlang side, using binary pattern matching. The C code should just take the buffer from enif_get_binary, cast it into the req struct, make the call, cast the result back to a buffer and pass it to enif_make_binary(), and then return that binary. No C "logic" that could be potentially screwed up. Just glue to let Erlang talk to a function it couldn't otherwise talk to. Erlang is the one doing the talking.

On the other hand, if you have a big, fat library of C code, and you want to expose it all to Erlang? Yeah, that's not what NIFs are for. (Port drivers can do that, but you're about the right amount of terrified of them here: they're for special occasions, like OpenSSL?.)

The "right" approach with some random untrusted third-party lib, is to 1. write a small C driver program for that library, and then 2. use Erlang to talk to it over some IPC mechanism (most easily, its stdio, which Erlang supports a particular protocol for.)

If you need more speed, you can still keep the process external: in the C process, create a SHM handle, and pass it to Erlang over your IPC mechanism. Write a NIF whose job is just to read from/write to that handle. Now do your blits using that NIF API. If the lib crashes, the SHM handle goes away, so handle that in a check in the NIF. Other than that, you're "safe."

reply

sashaafm 2 days ago [-]

Love your blog and book Sasa. Could elaborate on the fair scheduling disruption by NIFs? Don't recall ever reading about that

reply

...

sasa555 2 days ago [-]

Thanks, nice to hear that!

Basically a NIF blocks the scheduler, so if you run a tight loop for a long time, there will be no preemption. Therefore, invoking foo(), where a foo is a NIF which runs for say 10 seconds, means a single process will get 10 seconds of uninterrupted scheduler time, which is way more than other processes not calling that NIF.

There are ways of addressing that (called dirty schedulers), but the thing is that you need to be aware of the issue in the first place.

If due to some bug a NIF implementation ends up in an infinite loop, then the scheduler will be blocked forever, and the only way to fix it is to restart the whole system. That is btw. a property of all cooperative schedulers, so it can happen in Go as well.

In contrast, if you're not using NIFs, I can't think of any Erlang/Elixir program that will block the scheduler forever, and assuming I'm right, that problem is completely off the table.

---

https://news.ycombinator.com/item?id=6097754

azakai 1326 days ago

parent favorite on: LLVM Intermediate Representation is better than as...

It's not just calling conventions. LLVM IR also bakes in various assumptions about the target platform, from endianness to structure alignment to miscellaneous facts like whether char is signed or unsigned.

In many of those there is no going back to the original representation, they are one-way.

If you have IR that you can compile to various archs and have it work there, that is a lucky thing in that particular case. But it is not what LLVM IR was designed for nor should that work in general.

phaemon 1326 days ago [-]

I don't understand what this means. Could you please give an example of some code that loses information in this way when compiled with LLVM?

azakai 1326 days ago [-]

Say you have a struct type X with properties int, double, int. The offset of the last property depends on how alignment works on the target platform - it could be 12 or 16 on some common ones. LLVM IR can contain a read from offset 12, hardcoded. Whereas the C code contains X.propertyThree, which is more portable.

nostrademons 1326 days ago [-]

But that's not how LLVM works, at least when I worked with it a couple years ago. You would define a struct type in terms of primitive types (int64, ptr, etc), and then use getelementptr with the offset of the field path you wanted. Yes, it's a numeric offset, but it's a field offset within the struct, not a byte offset. LLVM handles packing, alignment, and pointer size issues for you automatically.

azakai 1326 days ago [-]

Yes, you can define structs and use getelementptr to access values. But, frontends can also bake in offsets calculated from getelementptr. They can also bake in sizeofs of structs, for example. And LLVM optimizations end up baking in hardcoded values in more subtle ways too.

vidarh 1326 days ago [-]

Once you have defined a struct in terms of primitive types, it is platform dependent.

Consider C:

A C int can be 16 bits. Or 32. Or 64. Etc. As long constraints of the relation to the other types is met.

The moment the frontend specifies a primitive type for a field in the struct, that code is incompatible with a whole lot of platforms.

eropple 1326 days ago [-]

Your primitive types aren't LLVM's though, are they? I mean, I haven't looked at LLVM thoroughly (just enough to be familiar with it, a friend is writing a language he wanted some input on), but I would be surprised and disappointed if they had a C "int" type as opposed to "signed 32-bit integer" or whatever. At which point it's compatible with whatever else is throwing around a signed 32-bit integer.

vidarh 1326 days ago [-]

But that is exactly the point - that LLVM IR is not platform independent.

The fronted must choose which specific integer type that "int" in C maps to. At that point, the IR is no longer machine independent - if you pick 32 bit signed ints to represent C "int", your program will not match the C ABI on any platform using 16 bit unsigned int as C "int" and you won't be able to directly make calls to libraries on that platform, for example.

eropple 1325 days ago [-]

So use uint32_t?

vidarh 1325 days ago [-]

This misses the point. The point is that if you pass a C program that uses "int" through a C-compiler that spits out LLVM IR, the resulting LLVM IR is not portable.

You might not be able to change the C program - it might be using "int" because the libraries it needs to interface to uses "int" in their signature (and handles it appropriately) on the platforms you care about.

phaemon 1326 days ago [-]

Ah, I think I see....you mean I could write non-portable IR code by doing that, although LLVM would never produce code like that? I guess there must always be IR that the frontend will never produce then?

caf 1326 days ago [-]

No, the implication is that the LLVM IR that the frontend produces changes depending on the ultimate target that the LLVM IR will be compiled to. In other words, the frontends aren't backend-agnostic.

phaemon 1326 days ago [-]

Oh, right! That makes more sense. So you have to specify the backend when you start the process? I didn't know LLVM did that.

azakai 1326 days ago [-]

Yes, the frontend very much knows what target you are aiming for. It greatly affects the IR that is generated.

And once you generate that IR, you can't just built it to an arbitrary target, it must be the one it was generated for.

---

" cfg is much nicer than #ifdef for managing conditionally compiled code. (It turns out that first class support for something is better than a processor layer glued onto the top.) I do have a few gripes about cfg, the biggest of which is that there’s no way to derive custom configurations (I would love to be able to replace all my #[cfg(any(target_os = "ios", target_os = "android", target_os = "emscripten"))]s with something like #[cfg(opengles)].) It turns out this is possible!. It was also frustrating to occasionally break the build on platforms other than the one I was primarily developing on but not have a way to determine that without doing a full build on every target platform. (It seems like the forthcoming portability lint will solve some or all of this pain.) ...

However, passing pointers across FFI boundaries is currently a little scary. Structs that are not marked repr(C) will currently work with FFI, but are very much not guaranteed to over the long term as their layout is subject to change (see Austin Hick’s recent work on layout optimization). improper_ctypes is an attempt at catching this issue, but in practice, most of the FFI APIs I use require a cast to void* which sidesteps the warning. It would be nice if there was a way to indicate in FFI signatures that a given pointer will (e.g. writing to an OpenGL? buffer) or won’t (e.g. passing a callback parameter) be read by C code, and improper_ctypes would apply to all of the “wills”, regardless of nearby casting. "

---

"

vvanders 4 days ago [-]

Great stuff, lines up a lot with what I'd care about as an ex-gamedev and spending a bit of time with Rust. One minor point:

> First class code hot-loading support would be a huge boon for game developers. The majority of game code is not particularly amenable to automated testing, and lots of iteration is done by playing the game itself to observe changes. I’ve got something hacked up with dylib reloading, but it requires plenty of per-project boilerplate and some additional shenanigans to disable it in production builds.

Lua is a great fit here and interops with Rust(and just about everything else) very well.

reply

eridius 4 days ago [-]

A long time ago I wrote bindings to Lua 5.1 (https://github.com/kballard/rust-lua), but there was a nasty problem where Lua uses longjmp() for errors (it can be configured to use C++ exceptions, but that doesn't work because you can't throw those past an extern C boundary without hitting undefined behavior). longjmp(), of course, will just skip right past the intervening stack frames, meaning if your Rust code calls into Lua and it throws an error, then unless you wrapped that call with a lua_pcall, any values that live on the stack of your Rust function will never call their destructors.

I admit I haven't bothered to research the current state of things, but how do more recent Lua bindings handle this? Does Lua 5.3 actually have a proper solution here, or do most bindings just wrap every single call with lua_pcall? I didn't do that in my bindings because I wanted to offer the full speed of Lua, but it's certainly an option.

reply

wahern 4 days ago [-]

The typical solution in C is to wrap and anchor such objects in the Lua VM so that they'll be garbage collected if an error is thrown. There are various patterns for doing this--direct binding of individual objects, or indirectly anchoring through an opaque staging struct--but that's the general idea.

Because Lua supports coroutines with a stack independent from the system C (Rust) stack, you often want to be careful mixing your stack-allocated objects. Lua 5.2 added lua_callk which allows yielding and resuming coroutines across the C API (that is, "yielding" a coroutine with a nested C or Rust function invocation).

Leveraging Lua's awesome coroutine support is one of the biggest reasons to use Lua, IMO.

Also, Lua can safely recover from out-of-memory (OOM) scenarios. On OOM it will throw an error that can be safely caught. Any Lua API which might internally allocate memory can throw this error, not just the lua_call family of routines. Quality libraries are expected to also handle OOM, which usually can be accomplished the same way as handling any other error: wrapping and anchoring temporaries, directly or indirectly, in the Lua VM.

reply "

---

" Julia's only drawback at this point is the relative dearth of libraries — but the language makes it unusually easy to interface with existing C libraries. Unlike with native interfaces in other languages, you can call C code without writing a single line of C, and so I anticipate that Julia's libraries will catch up quickly. "

---

"Julia has a “no boilerplate” philosophy: functions can be called directly from Julia without any “glue” code, code generation, or compilation — even from the interactive prompt. ... Shared libraries and functions are referenced by a tuple of the form (:function, "library") or ("function", "library") where function is the C-exported function name. library refers to the shared library name: shared libraries available in the (platform-specific) load path will be resolved by name, and if necessary a direct path may be specified.

A function name may be used alone in place of the tuple (just :function or "function"). In this case the name is resolved within the current process. This form can be used to call C library functions, functions in the Julia runtime, or functions in an application linked to Julia.

Finally, you can use ccall to actually generate a call to the library function. Arguments to ccall are as follows:

    (:function, “library”) pair (must be a constant, but see below).
    Return type, which may be any bits type, including Int32, Int64, Float64, or Ptr{T} for any type parameter T, indicating a pointer to values of type T, or Ptr{Void} for void* “untyped pointer” values.
    A tuple of input types, like those allowed for the return type. The input types must be written as a literal tuple, not a tuple-valued variable or expression.
    The following arguments, if any, are the actual argument values passed to the function.

As a complete but simple example, the following calls the clock function from the standard C library:

julia> t = ccall( (:clock, "libc"), Int32, ()) 2292761 ... One common gotcha is that a 1-tuple must be written with a trailing comma. For example, to call the getenv function to get a pointer to the value of an environment variable, one makes a call like this:

julia> path = ccall( (:getenv, "libc"), Ptr{Uint8}, (Ptr{Uint8},), "SHELL") Ptr{Uint8} @0x00007fff5fbffc45

julia> bytestring(path) "/bin/bash"

Note that the argument type tuple must be written as (Ptr{Uint8},), rather than (Ptr{Uint8}). This is because (Ptr{Uint8}) is just Ptr{Uint8}, rather than a 1-tuple containing Ptr{Uint8}:

julia> (Ptr{Uint8}) Ptr{Uint8}

julia> (Ptr{Uint8},) (Ptr{Uint8},)

In practice, especially when providing reusable functionality, one generally wraps ccall uses in Julia functions that set up arguments and then check for errors in whatever manner the C or Fortran function indicates them, propagating to the Julia caller as exceptions. This is especially important since C and Fortran APIs are notoriously inconsistent about how they indicate error conditions. For example, the getenv C library function is wrapped in the following Julia function in env.jl:

function getenv(var::String) val = ccall( (:getenv, "libc"), Ptr{Uint8}, (Ptr{Uint8},), bytestring(var)) if val == C_NULL error("getenv: undefined variable: ", var) end bytestring(val) end

The C getenv function indicates an error by returning NULL, but other standard C functions indicate errors in various different ways, including by returning -1, 0, 1 and other special values. This wrapper throws an exception clearly indicating the problem if the caller tries to get a non-existent environment variable:

julia> getenv("SHELL") "/bin/bash"

julia> getenv("FOOBAR") getenv: undefined variable: FOOBAR

Here is a slightly more complex example that discovers the local machine’s hostname:

function gethostname() hostname = Array(Uint8, 128) ccall( (:gethostname, "libc"), Int32, (Ptr{Uint8}, Uint), hostname, length(hostname)) return bytestring(convert(Ptr{Uint8}, hostname)) end

This example first allocates an array of bytes, then calls the C library function gethostname to fill the array in with the hostname, takes a pointer to the hostname buffer, and converts the pointer to a Julia string, assuming that it is a NUL-terminated C string. It is common for C libraries to use this pattern of requiring the caller to allocate memory to be passed to the callee and filled in. Allocation of memory from Julia like this is generally accomplished by creating an uninitialized array and passing a pointer to its data to the C function.

When calling a Fortran function, all inputs must be passed by reference.

A prefix & is used to indicate that a pointer to a scalar argument should be passed instead of the scalar value itself. The following example computes a dot product using a BLAS function.

function compute_dot(DX::Vector, DY::Vector) assert(length(DX) == length(DY)) n = length(DX) incx = incy = 1 product = ccall( (:ddot_, "libLAPACK"), Float64, (Ptr{Int32}, Ptr{Float64}, Ptr{Int32}, Ptr{Float64}, Ptr{Int32}), &n, DX, &incx, DY, &incy) return product end

The meaning of prefix & is not quite the same as in C. In particular, any changes to the referenced variables will not be visible in Julia. However, it will never cause any harm for called functions to attempt such modifications (that is, writing through the passed pointers). Since this & is not a real address operator, it may be used with any syntax, such as &0 or &f(x).

Note that no C header files are used anywhere in the process. Currently, it is not possible to pass structs and other non-primitive types from Julia to C libraries. However, C functions that generate and use opaque structs types by passing around pointers to them can return such values to Julia as Ptr{Void}, which can then be passed to other C functions as Ptr{Void}. Memory allocation and deallocation of such objects must be handled by calls to the appropriate cleanup routines in the libraries being used, just like in any C program. "