notes-computer-programming-programmingLanguageDesign-prosAndCons-generics

http://research.swtch.com/generic

http://www.osl.iu.edu/publications/prints/2005/siek05:_fg_pldi.pdf

http://www.bluebytesoftware.com/blog/2011/10/23/OnGenericsAndSomeOfTheAssociatedOverheads.aspx

http://www.osl.iu.edu/publications/prints/2003/comparing_generic_programming03.pdf

There's a newer paper (2007) though it's behind a paywall.

    andrewducker (December 3, 2009 12:55 PM) That paper is based on a prototype of generics in C#, before the final version was completed - implicit typing is mentioned as lacking, whereas the final version supported it (for instance).
    David Andersen (December 3, 2009 1:28 PM) Hi, Russ - not a comment about your questions, but a meta-point: Your post didn't address the question of whether the containers are statically type-safe. While one could imagine this being done orthogonally to how the code works (e.g., static type checking, but then box/unbox with the same impl), in practice, it seems that most of the time the generics also provide the type safety, and the boxing/unboxing approaches provide only runtime type checking. In practice, the question seems to be: slow compilers, bloated binaries, but more static type safety. Still a question, of course.

" The Generic Dilemma Posted on Thursday, December 3, 2009.

Generic data structures (vectors, queues, maps, trees, and so on) seem to be the hot topic if you are evaluating a new language. One of the most frequent questions we've had about Go is where the generics are. It seems like there are three basic approaches to generics:

    (The C approach.) Leave them out.
    This slows programmers.
    But it adds no complexity to the language.
    (The C++ approach.) Compile-time specialization or macro expansion.
    This slows compilation.
    It generates a lot of code, much of it redundant, and needs a good linker to eliminate duplicate copies. The individual specializations may be efficient but the program as a whole can suffer due to poor use of the instruction cache. I have heard of simple libraries having text segments shrink from megabytes to tens of kilobytes by revising or eliminating the use of templates.
    (The Java approach.) Box everything implicitly.
    This slows execution.
    Compared to the implementations the C programmer would have written or the C++ compiler would have generated, the Java code is smaller but less efficient in both time and space, because of all the implicit boxing and unboxing. A vector of bytes uses significantly more than one byte per byte. Trying to hide the boxing and unboxing may also complicate the type system. On the other hand, it probably makes better use of the instruction cache, and a vector of bytes can be written separately.

The generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?

I would be happy to learn about implementations that somehow manage to avoid all three of these bad outcomes, especially if there were good written descriptions (papers, blog posts, etc.) about what was hard and why it's a good approach. I'd also be interested to see good written descriptions of trying one of those approaches and what went wrong (or right).

Please leave comments with pointers. Thanks.

    yuku (December 3, 2009 9:54 AM) How about C# implementation? It lets you use primitive types.
    Barry Kelly (December 3, 2009 10:00 AM) The massive gaping hole in your list of approaches is .NET: reified generics, but in a reference-type oriented language.
    C++ suffers because it has lots of support for value-oriented programming, with "clever" tricks like copy constructors, assignment operators, implicit conversions, etc. The lack of GC encourages flirting with almost-but-not-quite-good-enough smart pointers, and storing smart pointers in collections means every collection needs to call the appropriate copy constructors, destructors, etc. for internal operations. It makes it hard to share code.
    .NET has reified generics, but all generic instantiations with reference types use the same actual instantiation behind the scenes, using the hidden type "System.__Canon" as the type argument. This requires smuggling in the actual type argument list in certain locations - e.g. a hidden argument to generic method calls, a field in the type descriptor for distinct instantiated types, etc.
    .NET generics are also instantiated at runtime, which means it can share instantiations across multiple dynamically-linked libraries. That's somewhat harder to get right in a precompiled environment; a few indirections and you may be able to get most of the benefit by having only one instantiation active and in caches, leaving the others cold and not paged in.
    andrewducker (December 3, 2009 10:15 AM) http://msdn.microsoft.com/en-us/library/ms379564(VS.80).aspx talks about the .Net Generic implementation in a bit more detail.
    Paul Snively (December 3, 2009 10:51 AM) I would also take a look at BitC's polyinstantiation system. In fact, in general, I would take as much inspiration from BitC as you can without sacrificing whatever specific goals you may have (e.g. blinding compilation speed) that BitC doesn't necessarily share.
    newt0311 (December 3, 2009 11:11 AM) Super-compilers would solve this problem but making them work effectively requires some major developments in compiler research. Its one of the few areas of academic CS that is useful.

---

" Rivorus (December 3, 2009 1:53 PM) [2-part post]

Go should definitely try and support the Generic Programming paradigm as envisioned by Stepanov, which most closely is accomplished with C++ templates, overloading, and specialization (which is why the STL was implemented in C++ to begin with). His recent book "Elements of Programming" does a great job of walking through Generic Programming, which I highly recommend all programmers read, no matter their language background.

That's not to say that Go needs templates in the C++ sense, all it needs is type functions, a way to specify true Generic Programming concepts (including associated types and what would have been C++0x concept_maps, etc. not just the basic name/function signature matching of Go interfaces), and a way to specialize/overload algorithms based on concept refinements and specific models of concepts. ... Keep in mind that C++'s templates with respect to generic algorithms and datastructures have much more of an advantage than being "fast", they figure out dispatching at compile-time and are able to deal with type/concept information at compile-time meaning more errors caught before the program is even run. As well, as you bring more relationships to compile-time, certain complicated operations such as a subset of the occasions you'd resort to multi-dispatch in a Java or C# world are instead resolved at compile-time via simple overloading and/or specialization. Being generic and being fast are not mutually exclusive as other languages seem to make people think. It is possible to get abstraction with no extra runtime overhead.

In my opinion, if Go is to be relevant to the Generic Programmers of C++, it needs to support the Generic Programming paradigm more concisely than C++ templates without sacrificing functionality, which while is difficult, is far from an impossible task. This is particularly important as C++0x has dropped direct support for concepts meaning that if Go beats C++ to its own game it will truly be in a league of its own. As I mentioned early, really all you need to satisfy most Generic Programmers needs are a way to represent concepts, type functions, a way to specialize/overload algorithms based on concept requirements, and preferably something like what would have been C++0x concept_maps. These are all that it takes to make a statically-typed language that supports libraries like the STL, and yet other mainstream languages miss that.

Again, Stepanov's "Elements of Programming" is a great place to look to get a more mathematical view of Generic Programming and I encourage you to please read it. Go's interfaces are a tiptoe in the right direction and if they can be expanded to encompass concepts you will begin to see many more programmers migrating over to Go from C++. "

    Jonathan Amsterdam (December 6, 2009 7:23 PM) Rivorus - I understand what you mean when you talk about a complete concept/template system implying metaprogramming. (Haven't read Stepanov yet, but I'm eager to do so.) But it doesn't follow that any template system that falls short is "handicapped." The generics of C#, Haskell, Eiffel and Ada relate to the theory of parametric polymorphism, where the idea is simply to parameterize types over types. In a sense, any such system is weaker than full metaprogramming, just as regular expressions are weaker than Turing machines. But that doesn't mean that they are deficient. They are just solving a different problem.
    Rivorus (December 6, 2009 11:41 PM) True, I suppose "handicapped" is a more harsh term than I intended. What I mean to say is that it rules out a particular approach to programming that is becoming more common, and for good reason, particularly in generic libraries of modern C++. There is a lot of potential for creating extremely powerful and generic libraries when you have facilities for type functions and ways to specialize algorithms and datastructures based on compile-time type and concept information.
    The Boost Graph Library is a great example of that, and to a similar but lesser extent the STL. If languages are to advance in the direction of supporting more generic code I think that we need to focus on taking those ideas and translating them into something simpler than what currently exists as C++ templates but without removing overall functionality.
    The paper linked by drhodes is a good reference to go by. If Go can be made to support the programming of a library like the Boost Graph Library better than all the languages listed it will be a step ahead for generic programming. In my opinion, anything else is a step back or at the very least unnecessary stagnation.
    sambeau (December 7, 2009 3:44 PM) >"Composable typeclasses would fit Go's orthogonal philosophy like a glove, IMHO."
    +1
    ncm (December 11, 2009 2:44 PM) "Go already has built-in features that obviate the need for some common C++ generics (e.g. GC makes smart pointers pointless)"
    Built-in features that duplicate library constructs in another language indicate weakness, not strength. They imply that the language is not strong enough to address those needs without special-casing. We might therefore point out instead that C++'s destructors make GC pointless, and (further) illustrate GC's weakness. GC can only manage memory, but not the numerous other resources that are also routinely managed with destructors. (We might add that exceptions are weakened, as an architectural aid, by the lack of destructors, to the point of near uselessness.)
    It betrays fundamental confusion to group Haskell among C#, Eiffel, and Ada. C++ and Haskell can express what these other languages cannot. Regular expressions really are deficient, unless you have resolved to ignore any problem your language is not up to addressing. "Handicapped" is simply accurate.
    A simpler language that is as powerful as C++ is long overdue. Go, sadly, isn't it. Go will grow, but it's fatally handicapped by unfortunate early design choices. C++ will get concepts and modules in 2015. Those won't make it a simpler language, but it will be simpler to program in, and compile times will come down. That will carry us until a plausible successor emerges.
    Julian Morrison (December 11, 2009 3:06 PM) ncm, you forget that GC also manages ownership and allows all code to share the same ownership/destruction policy, which is vital for Java-style glue code that links together assortments of libraries.
    "finally" blocks or Go's "defer" take care of deterministic resource reclamation.
    Russ Cox (December 11, 2009 3:12 PM) C++ will get concepts and modules in 2015.
    Is your suggestion that we sit on our hands until then?
    ncm (December 11, 2009 3:42 PM) "Finally" blocks are the inadequate response to the fatal weakening of exceptions. They wipe out the centralization of error handling that was the architectural purpose of exceptions in the first place. 

As the paradigm has become better understood and libraries like the STL, Adobe/Boost's Generic Image Library, and the Boost Graph Library have come about, we have learned what fundamental functionality should exist to allow proper generic programming. Despite the onslaught of features that get thrown into languages like C# they still simply don't get it. In particular, you should have the ability to express concepts with associated types, concept refinement, overloading/specialization of algorithms based on which concepts arguments model, and the ability to produce type functions. C++ strives because it is powerful enough to hackishly represent many of these features through templates to some degree. All you have to do is support this functionality directly and you will be a step ahead with respect to generic programming.

"

    Here's my personal take on things, in no particular order. One "problem" that jumps out at me personally, in reference to the language FAQ, is in the question "Why are maps, slices, and channels references while arrays are values?". The rationale provided is very weak -- it talks about how early on the implementation of maps were "syntactically" pointers and that you couldn't create a non-dynamic instance. Further, it goes on to say "...we struggled with how arrays should work." Neither of these are answers to the question.
    I'll try to shed some light on this from a C++ perspective. In C++, maps, and all other containers, are values. In fact, with only some embarrassing exceptions, all object types in C++ have proper semantics of regular types (those exceptions notably being legacy C-arrays, which is one reason why the TR1/boost array template exists, and std::auto_ptr which exists as-is because of the lack of move semantics in the current standard and the need to efficiently and safely return dynamically allocated data from functions). Consistent semantics for regular types are extremely important to generic programming and that is why generic datastructures in C++ are as powerful as they are. You need a generic way of efficiently working with the fundamental capabilities of all types with a consistent high level menaing (for example, a single interface for copyability and what that means with respect to equality).
    Similarly, the FAQ states:
    "Map lookup requires an equality operator, which structs and arrays do not implement. They don't implement equality because equality is not well defined on such types; there are multiple considerations involving shallow vs. deep comparison, pointer vs. value comparison, how to deal with recursive structures, and so on. We may revisit this issue—and implementing equality for structs and arrays will not invalidate any existing programs—but without a clear idea of what equality of structs and arrays should mean, it was simpler to leave it out for now."
    As I touched on in the previous paragraph, you need to represent the concept of regular types. You have a notion of copyability of arrays already and that should give hint as to how equality should be implemented. By very definition, making a copy of something means making another instance with the same value. Referential equality is a separate concern and should not be confused. The act of copying implies that the values of each object are now equal as should be reflected by a well-defined equality operator. If after a copy the two objects do not compare as equal either the copy operation wasn't truely a copy operation or the equality operation wasn't correctly defined.
    Rivorus (December 12, 2009 12:58 PM) [multipart post continued]
    Another decision that I am in much disagreement with is the lack of what C++ programmers know as destructors. This is very briefly mentioned here, and a rationale is virtually nonexistent. I am left to believe that such functionality was left out simply because the designers of Go do not understand how important destructors really are. Herb Sutter has covered this topic already fairly well and explains why destructors should be a preferred default over features like Go's "defer." That's not to say that defer is bad, it's just that it is no replacement for destructors. Using "defer" has to be done explicitly whenever a type is used rather than being specified once in a type's definition and used implicitly from then on.
    In the Go language specification you can see a great example of where destructors are better suited than defer statements. When you create a lock, it would be a very bad idea to never unlock or to unlock at some nondeterministic time. A destructor makes this automatic and implicit. There is no defer statement that you can forget to write because unlocking is already implied. In fact, this is exactly how locks are implemented in Boost and how they are implemented in the next C++ standard. A similar need for destructors exists with any type that manages resources, which is many if not most non-trivial types. RAII is important and constructors/destructors are the key way these are implemented in C++. There is currently no simple way to do this implicitly in Go.
    That alone is already a solid argument for destructors, but what does that lack of destructors mean specifically with respect to generic programming? A lot, actually. Whenever you use a type with a generic datastructure or algorithm and that code is responsible for creating instances of your type, you should hope that resources are deterministically cleaned up when an instance of the type leaves scope. You can't write a defer statement in generic code if you don't know what exactly needs to be cleaned up. This is an implementation detail that generic code doesn't (and shouldn't) know about. With features like destructors, the correct behavior happens implicitly.
    Rivorus (December 12, 2009 1:04 PM) [multipart post continued]
    Finally, there is the issue of overloading. Once again, the FAQ has a very weak rationale, talking about the fact that it makes implementation of the language simpler and that it can be "confusing." Certain things might be tough to understand at first but that doesn't mean that they aren't important to understand. For many programmers, overloading is already second nature. Not only that, but it is extremely important to generic programming and leaving it out means not being able to create efficient and generic libraries.
    In a statically-typed language, overloads dispatch based on data types, and in an ideal world, on what concepts they model. In a well-designed generic program in a statically-typed language you have high level generic algorithms built around smaller and smaller algorithms, each operating on models of concepts. If there is a more refined concept or a particular type that can have a more efficient implementation of a given algorithm, then that algorithm is overloaded for that type or concept. In C++, since all of the overloading is based purely on type/concept data, which functions are dispatched is decided at compile time and you get zero overhead.
    The reason why this type of overloading is so necessary to generic programming and why simply naming functions differently doesn't cut it is because of the hierarchical nature of algorithms. A calls B which calls C which calls D and E, etc. each one with a generic implementation that can work on anything that models a particular concept, for instance, any container or range with access to any element taking linear time. Now imagine that algorithm "C" can be written in a way that is more efficient when it is passed a range that allows accessing of arbitrary elements in constant time rather than linear time. If you can overload C based on this type information, then you in effect make it so that algorithms A and B are also each automatically more efficient when you pass in a range with constant-time access to any element.
    Without this ability to overload based on the type system you cannot get composable and efficient generic code. In reference to the example above, making a differently named "C" for the special case optimized situation rather than overloading implies having to make a differently named "A" and "B" as well if you want them to take advantage of your optimized code. This means that more algorithms have to be written if they need to be efficient rather than getting it for free and anyone using your algorithms now has to know to call your special case algorithm with the new name if he wants the more optimal implementation. If their code is already written, that potentially means they'd have to rewrite it if they need it to be faster simply because an implementation detail changed at some arbitrary location down the chain. It's just bad news for everyone. All of that is possible in C++ because of the type system, overloading and SFINAE. While the concept-like facility in Go is nice, without overloading and associated types it's only half-way there."

Nice strawman that turns reality on its head. ncm made no such suggestion, but rather drew a conclusion from an observation about Go -- "unfortunate early design choices". It's not just choices seen in the language, but choices about the language -- basic design philosophy. It is now mid-2011 with no glimmer of a generic facility. One can predict, from that design philosophy and from seeing Go's incredibly weak version of exceptions, that if it has generics by 2015 they too will be incredibly weak. Getting generics right takes a lot of careful work, by people who actually believe in them and are aware of their power and the wealth of accumulated experience in programming language design. Take a look, for instance, at D, Scala, and Jade -- a spinoff of Emerald, which as you know has interfaces much like Go's.