continuation of notes-computer-programming-programmingLanguageDesign-prosAndCons-golang
http://monoc.mo.funpic.de/go-rant/
http://go-lang.cat-v.org/quotes
---
http://www.stanford.edu/class/ee380/Abstracts/100428-pike-stanford.pdf
--
http://www.informit.com/articles/article.aspx?p=1623555
---
craigyk 4 days ago
| link |
I went and looked up some of my notes back when I was trying out Go. I hope to learn from others wether these are real issues or simply my misunderstandings.
1. So we don't get generics, but the language built-ins seem to get to be type-parameterized (channels, slices, maps). Unfortunately the syntax for doing so for each of these is inconsistent (probably as a result of being special-cased rather than dog-fooded using language-level generics): []float, map[float]float, chan float.
2. Built-in types seem to receive other special treatments as well, which includes special initialization keywords (make vs. new) and built-in functions (len, cap, etc.) but I don't see why this needed to be the case, even for performance reasons. There's no reason why the these built-in types couldn't pretend to implement built-in interfaces to make more transparent with user-types while having the compiler optimize them with special-case functions for efficiency.
3. Unused variables are a hard error which is a completely understandable stance. Unfortunately, I think people may use workarounds to get around this. Also, I can't believe unused variables are a hard error, but uninitialized variables are not! Instead we are supposed to trust that everything is OK since they get initialized to some kind of "zero" value that isn't even under the developer's control.
4. Other small quibbles: I think pattern matching on function arguments could have been implemented as sugar that uses interfaces and method calls under the covers. Also named return values are ugly, and the function declaration syntax could have been made more concise.
reply
---
" ... A good collections library should have the choices required to get asymptotically optimal algorithms built with almost no fuss, with the choices between ordered/unordered and set/list/map and queue/stack/heap being easy to find. Java undeniably does this really well; even if Go still does this ok.
Sometimes when using Go channels, I think I might be better off just going back to Erlang due to its pattern matching. The most troublesome part of verifying the correctness of Erlang code is following the implicit protocol in the weakly typed structs being passed around. There is stronger typing in Go, but without normal subtype declarations, it becomes convoluted to do declare a whole bunch of structs with some referential integrity (ie: a FooReply? struct that contains a ref to a FooRequest?, let alone an IPPacket struct that contains a TCPPacket
| UDPPacket, ... either a safe union or just normal subtyped references, etc.) |
Thanks, Rob, for your comment. Some of the subtype requirements can be handled well in Go using interfaces. In my program too, I had some useful interfaces. "
---
" 2014-01-08 Another go at Go ... failed!
After a considerable gap, a gave Go another go! The Problem
As part of a consulting engagement, I accepted a project to develop some statistical inference models in the area of drug (medicine) repositioning. Input data comprises three sets of associations: (i) between drugs and adverse effects, (ii) between drugs and diseases, and (iii) between drugs and targets (proteins). Using drugs as hub elements, associations are inferred between the other three kinds of elements, pair-wise.
The actual statistics computed vary from simple measures such as sensitivity (e.g. how sensitive is a given drug to a set of query targets?) and Matthews Correlation Coefficient, to construction of rather complex confusion matrices and generalised profile vectors for drugs, diseases, etc. Accordingly, the computational intensity varies considerably across parts of the models.
For the size of the test subset of input data, the in-memory graph of direct and transitive associations currently has about 15,000 vertices and over 1,4000,000 edges. This is expected to grow by two orders of magnitude (or more) when the full data set is used for input. Programming Language
I had some temptation initially to prototype the first model (or two) in a language like Ruby. Giving the volume of data its due weight though, I decided to use Ruby for ad hoc validation of parts of the computations, with coding proper happening in a faster, compiled language. I have been using Java for most of my work (both open source as well as for clients). However, considering the fact that statistics instances are write-only, I hoped that Go could help me make the computations parallel easily[1].
My choice of Go caused some discomfort on the part of my client's programmers, since they have to maintain the code down the road. No serious objections were raised nevertheless. So, I went ahead and developed the first three models in Go. Practical Issues With Go
The Internet is abuzz with success stories involving Go; there isn't an additional perspective that I can add! The following are factors, in no particular order, that inhibited my productivity as I worked on the project. No Set in the Language
Through (almost) every hour of this project, I found myself needing an efficient implementation of a set data structure. Go does not have a built-in set; it has arrays, slices and maps (hash tables). And, Go lacks generics. Consequently, whichever generic data structure is not provided by the compiler can not be implemented in a library. I ended up using maps as sets. Everyone who does that realises the pain involved, sooner than later. Maps provide uniqueness of keys, but I needed sets for their set-like properties: being able to do minus, union, intersection, etc. I had to code those in-line every time. I have seen several people argue vehemently (even arrogantly) in golang-nuts that it costs just a few lines each time, and that it makes the code clearer. Nothing could be further from truth. In-lining those operations has only reduced readability and obscured my intent. I had to consciously train my eyes to recognise those blocks to mean union, intersection, etc. They also were very inconvenient when trying different sequences of computations for better efficiency, since a quick glance never sufficed!
Also, I found the performance of Go maps wanting. Profiling showed that get operations were consuming a good percentage of the total running time. Of course, several of those get operations are actually to check for the presence of a key. No BitSet? in the Standard Library
Since the performance of maps was dragging the computations back, I investigated the possibility of changing the algorithms to work with bit sets. However, there is no BitSet? or BitArray? in Go's standard library. I found two packages in the community: one on code.google.com and the other on github.com. I selected the former both because it performed better and provided a convenient iteration through only the bits set to true. Mind you, the data is mostly sparse, and hence both these were desirable characteristics.
Incidentally, both the bit set packages have varying performance. I could not determine the sources of those variations, since I could not easily construct test data to reproduce them on a small scale. A well-tested, high performance bit set in the standard library would have helped greatly. Generics, or Their Absence
The general attitude in Go community towards generics seems to have degenerated into one consisting of a mix of disgust and condescension, unfortunately. Well-made cases that illustrate problems best served by generics, are being dismissed with such impudence and temerity as to cause repulsion. That Russ Cox' original formulation of the now-famous tri-lemma is incomplete at best has not sunk in despite four years of discussions. Enough said!
In my particular case, I have six sets of computations that differ in:
types of input data elements held in the containers, and upon which the computations are performed (a unique combination of three types for each pair, to be precise),
user-specified values for various algorithmic parameters for a given combination of element types,
minor computational steps and
types (and instances) of containers into which the results aggregate.These differences meant that I could not write common template code that could be used to generate six versions using extra-language tools (as inconvenient as that already is). The amount of boiler-plate needed externally to handle the differences very quickly became both too much and too confusing. Eventually, I resorted to six fully-specialised versions each of data holders, algorithms and results containers, just for manageability of the code.
This had an undesirable side effect, though: now, each change to any of the core containers or computations had to be manually propagated to all the corresponding remaining versions. It soon led to a disinclination on my part to quickly iterate through alternative model formulations, since the overhead of trying new formulations was non-trivial. Poor Performance
This was simply unexpected! With fully-specialised versions of graph nodes, edges, computations and results containers, I was expecting very good performance. Initially, it was not very good. In single-threaded mode, a complete run of three models on the test set of data took about 9 minutes 25 seconds. I re-examined various computations. I eliminated redundant checks in some paths, combined two passes into one at the expense of more memory, pre-identified query sets so that the full sets need not be iterated over, etc. At the end of all that, in single-threaded mode, a complete run of three models on the test set of data took about 2 minutes 40 seconds. For a while, I thought that I had squeezed it to the maximum extent. And so thought my client, too! More on that later. Enhancement Requests
At that point, my client requested for three enhancements, two of which affected all the six + six versions of the models. I ploughed through the first change and propagated it through the other eleven specialised versions. I had a full taste of what was to come, though, when I was hit with the realisation that I was yet working on Phase 1 of the project, which had seven proposed phases in all! Back to Java!
I took a break of one full day, and did a hard review of the code (and my situation, of course). I quickly identified three major areas where generics and (inheritance-based) polymorphism would have presented a much more pleasant solution. I had already spent 11 weeks on the project, the bulk of that going into developing and evaluating the statistical models. With the models now ready, I estimated that a re-write in Java would cost me about 10 working days. I decided to take the plunge.
The full re-write in Java took 8 working days. The ease with which I could model the generic data containers and results containers was quite expected. Java's BitSet? class was of tremendous help. I had some trepidation about the algorithmic parts. However, they turned out to be easier than I anticipated! I made the computations themselves parts of formally-typed abstract classes, with the concrete parts such as substitution of actual types, the user-specified parameters and minor variations implemented by the subclasses. Conceptually, it was clear and clean: the base computations were easy to follow in the abstract classes. The overrides were clearly marked so, and were quite pointed.
Naturally, I expected a reduction in the size of the code base; I was not sure by how much, though. The actual reduction was by about 40%. This was nice, since it came with the benefit of more manageable code.
The most unexpected outcome concerned performance: a complete run of the three models on the test set of data now took about 30 seconds! My first suspicion was that something went so wrong as to cause a premature (but legal) exit somewhere. However, the output matched what was produced by the Go version (thanks Ruby), so that could not have been true. I re-ran the program several times, since it sounded too good to be true. Each time, the run completed in about 30 seconds.
I was left scratching my head. My puzzlement continued for a while, before I noticed something: the CPU utilisation reported by /usr/bin/time was around 370-380%! I was now totally stumped. conky showed that all processor cores was indeed being used. How could that be? The program was very much single-threaded.
After some thought and Googling, I saw a few factors that potentially enabled a utilisation of multiple cores.
All the input data classes were final.
All the results classes were final, with all of their members being final too.
All algorithm subclasses were final.
All data containers (masters), the multi-mode graph itself, and all results containers had only insert and look-up operations performed on them. None had a delete operation.Effectively, almost all of the code involved only final classes. And, all operations were append-only. The compiler may have noticed those; the run-time must have noticed those. I still do not know what is going on inside the JRE as the program runs, but I am truly amazed by its capabilities! Needless to say, I am quite happy with the outcome, too! Conclusions
If your problem domain involves patterns that benefit from type parameterisation or[2] polymorphism that is easily achievable through inheritance, Go is a poor choice.
If you find your Go code evolving into having few interfaces but many higher-order functions (or methods) that resort to frequent type assertions, Go is a poor choice.
Go runtime can learn a trick or two from JRE 7 as regards performance.These may seem obvious to more informed people; but to me, it was some enlightenment!
[1] I tried Haskell and Elixir as candidates, but nested data holders with multiply circular references appear to be problematic to deal with in functional languages. Immutable data presents interesting challenges when it comes to cyclic graphs! The solutions suggested by the respective communities involved considerable boiler-plate. More importantly, the resulting code lost direct correspondence with the problem's structural elements. Eventually, I abandoned that approach.↩ "
--- http://oneofmanyworlds.blogspot.com/2014/01/another-go-at-go-failed.html
--
"
http://golang.org/pkg/container/
http://docs.oracle.com/javase/tutorial/collections/interfaces/index.html
http://docs.oracle.com/javase/tutorial/collections/implementations/index.html "
"
pron 6 hours ago
| link |
... and those don't even include Java's long list of concurrent collections.
reply
jwn 6 hours ago
| link |
Or the other Collection implementations offered by https://code.google.com/p/guava-libraries/.
reply
jbooth 6 hours ago
| link |
Or the ability to write collections in the first place, as enabled by generics. You can't even write a collection in Go unless you're dealing with interface{} or unsafe.Pointer and casting a lot.
I really like the language, enough to use it and work around the lack of generics, but that's a glaring weakness. They need some way to enable container classes.
reply "
--
" pcwalton 2 hours ago
| link |
Generics, inheritance, and the factory pattern are completely orthogonal features. Adding generics would not entail adding inheritance, mandating the factory pattern, or any other slippery slope feature. ... Generics have nothing to do with inheritance and the factory pattern. There are lots of languages that have generics but neither inheritance neither factories: SML, OCaml, Haskell, etc.
Inheritance makes generics harder, in fact, because type inference becomes undecidable in the general case.
Factories are basically a workaround for functions not being able to be freestanding (unattached to a class) in Java (the "kingdom of nouns"), a problem that Go doesn't have.
reply"
--
"
pjmlp 5 hours ago
| link |
... and the concurrent libraries of futures, fork-join, tasks
reply
sixthloginorso 5 hours ago
| link |
Java's concurrency primitives and libraries are really overlooked in these discussions. Is the syntax too off-putting, or is it merely that Java is unfashionable?
reply
RyanZAG? 4 hours ago
| link |
Very much unfashionable for the HN crowd. Most of the new research in concurrency is happening in Java and funded by the high performance trading industry - an industry which is very far from the HN crowd. The new Java8 stampedlock is a good example. It's possible to implement it in C++ as well, but because of the guarantees required by the lock it is a very difficult lock to integrate into C++ code. On the other hand, the JRE guarantees the correct constraints for Java code making a stampedlock very easy to use [1]. The performance of a stampedlock also seems to be the best case for any multi-reader environment. [2]
[1] http://concurrencyfreaks.blogspot.com/2013/11/stampedlocktry...
[2] http://mechanical-sympathy.blogspot.ca/2013/08/lock-based-vs...
reply
scott_s 29 minutes ago
| link |
I find your claim that most new research in concurrency happens in Java strange. Perhaps you are unfamiliar with academic research in concurrency and parallelism? A way to get a small taste is to look at recent papers from the conference Practice and Principles of Parallel Programming (PPoPP?).
reply
jamra 3 hours ago
| link |
I don't think there is anything wrong with Java as a language, but I have a question for you. Does Java's concurrency engine allow for many threads like Go or is it similar to C# in how your threads are not lightweight and are therefore limited to ixN where N is the number of CPU cores and i is a small integer < 10.
reply
pcwalton 2 hours ago
| link |
You can't spawn 80 1:1 pthreads on an 8-core machine? Huh?
Operating systems have been able to handle an order of magnitude more pthreads than that since the day pthreads were introduced.
reply
pron 3 hours ago
| link |
Java doesn't have a "concurrency engine", but a very large set of concurrency primitives: schedulers, locking and lock-free data structures, atomics etc.
To answer your question, yes: my own library, Quasar[1], provides lightweight threads for Java.
[1]: https://github.com/puniverse/quasar
reply
pjmlp 2 hours ago
| link |
The Java language specification does not define how threads are implemented.
The first set of JVMs did implement green threads, which are what goroutines are. Shortly thereafter most of them switched to red threads, aka real threads.
You can still find a few JVMs that use green threads, like Squawk.
https://java.net/projects/squawk/pages/SquawkDesign
Other than that, java.util.concurrent provides mechanisms to map multiple tasks to single OS threads.
reply "
--
"
cosn 51 minutes ago
| link |
I wrote most of those already, if anyone needs them, help yourself. Haven't had the need for a bit set, but I guess I can add it to the TODO list :)
https://github.com/cosn/collections/
That being said, I completely agree that the author chose the wrong language for the problem at hand.
reply
redbad 5 hours ago
| link |
> Go's zoo of builtin data structures is really, really
> poor compared to Java.This is a poor comparison because idiomatic Go typically doesn't use the `container` package.
reply "
---
" Either stick with what you know or use a tool that has very specific optimizations for the problem you're going to tackle (and no, concurrency is not a good option to jump off the JVM as it has incredible concurrency support already: see https://github.com/LMAX-Exchange/disruptor for the fastest concurrent bus I've seen in any language). "
--
"
_pmf_ 4 hours ago
| link |
> see https://github.com/LMAX-Exchange/disruptor for the fastest concurrent bus I've seen in any language
The genious behind LMAX is the way they bend Java's object layout features; they achieve nice performance in spite of using Java, not because of it. Some decade old message passing libraries (OpenMP?