proj-plbook-plPartPractices

Table of Contents for Programming Languages: a survey

Part IV: Idioms, patterns, and best practices

DRY

YAGNI

visitor

strategy pattern

factory pattern

inversion of control

http://en.wikipedia.org/wiki/Inversion_of_control

http://en.wikipedia.org/wiki/Dependency_injection (a form of inversion of control)

http://en.wikipedia.org/wiki/Template:Design_Patterns_patterns

misc

two-way model binding

" ...the most important factor in making your code maintainable in the long run is breaking dependencies. If piece X depends on seven other pieces and isn't a trivial bit of wiring code, that's a smell. " -- Nick Stenning, http://list.hypothes.is/archive/dev/2014-09/

futureproofing patterns

(cross-reference extensibility)

http://journal.stuffwithstuff.com/2010/09/18/futureproofing-uniform-access-and-masquerades/

points out three kinds of futureproofing in java:

these are all annoying boilerplate, but they are all good things to do because if you don't, and you want to make one of the following changes, then you must change every call site, instead of just one line. This is especially bad if your program is a shipped library and the call sites are in client code (e.g. you would have to make breaking/incompatible changes to your library):

now, Jasper already deals with the first two of these, and maybe the third. If not, We should probably deal with the third, too. That is to say, when you call a constructor (if we have constructors at all; i'm leaning towards yes), you don't actually determine an implementation but merely an interface, and perhaps a factory method to determine the implementation. In other words, 'everything is an interface', like we always say.

in http://journal.stuffwithstuff.com/2010/10/21/the-language-i-wish-go-was/, he also points out a few more that Go handles that Java doesn't:

in http://journal.stuffwithstuff.com/2010/10/21/the-language-i-wish-go-was/, he also points others kinds of futureproofing that aren't needed in Java but that may be needed in other languages, such as Go:

book rec: design patterns

Chapter: Types of coding tasks

numeric linear algebra:

operator overloading is crucial

https://news.ycombinator.com/item?id=6284842

embedded:

memory footprint is crucial (often, GC isn't good enough, e.g. Objective-C)

hobby project:

little boilerplate; no need for team features

simulation: CPU-bound; speed; OOP; parallelism

Concurrency patterns

overuse of synchronization leads to throwing away a lot of concurrency. advanced algorithms such as lock-free algorithms accept/permit nondeterminacy as much as possible during their execution, but still give deterministic results at the end. but they are difficult to write/reason about.

if using basic locking, where to lock in data structures (e.g. to create critical sections): create a critical section out of any piece of code s.t. if you stopped in the middle, an invariant would be violated; or any piece of code s.t. if another thread updated shared memory in the middle, there would be a problem.

beware: returns, gotos, possible exceptions; anything that could interrupt you

beware: function calls that might call something else that would block on one of the locks you've acquired

POSIX, pthreads

safety patterns

http://spinroot.com/gerard/pdf/P10.pdf is a great read. it has 10 simple rules for safety:

" 1. Rule: Restrict all code to very simple control flow constructs – do not use goto statements, setjmp or longjmp constructs, and direct or indirect recursion.

Rationale: Simpler control flow translates into stronger capabilities for verification and often results in improved code clarity. The banishment of recursion is perhaps the biggest surprise here. Without recursion, though, we are guaranteed to have an acyclic function call graph, which can be exploited by code analyzers, and can directly help to prove that all executions that should be bounded are in fact bounded. (Note that this rule does not require that all functions have a single point of return – although this often also simplifies control flow. There are enough cases, though, where an early error return is the simpler solution.)

2. Rule: All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded...This rule does not, of course, apply to iterations that are meant to be non-terminating (e.g., in a process scheduler). In those special cases, the reverse rule is applied: it should be statically provable that the iteration cannot terminate.

...

3. Rule: Do not use dynamic memory allocation after initialization.

...

4. Rule: No function should be longer than what can be printed on a single sheet of paper in a standard reference format with one line per statement and one line per declaration. Typically, this means no more than about 60 lines of code per function.

5. Rule: The assertion density of the code should average to a minimum of two assertions per function. Assertions are used to check for anomalous conditions that should never happen in real-life executions. Assertions must always be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken, e.g., by returning an error condition to the caller of the function that executes the failing assertion.

...

A typical use of an assertion would be as follows:

if (!c_assert(p >= 0) == true) { return ERROR; }

with the assertion defined as follows:

  1. define c_assert(e) ((e) ? (true) : \ tst_debugging(”%s,%d: assertion ’%s’ failed\n”, \ __FILE__, __LINE__, #e), false)

In this definition, __FILE__ and __LINE__ are predefined by the macro preprocessor to produce the filename and line-number of the failing assertion. The syntax #e turns the assertion condition e into a string that is printed as part of the error message.

...

6. Rule: Data objects must be declared at the smallest possible level of scope.

Rationale: This rule supports a basic principle of data-hiding. Clearly if an object is not in scope, its value cannot be referenced or corrupted. Similarly, if an erroneous value of an object has to be diagnosed, the fewer the number of statements where the value could have been assigned; the easier it is to diagnose the problem....

7. Rule : The return value of non-void functions must be checked by each calling function, and the validity of parameters must be checked inside each function.

... In its strictest form, this rule means that even the return value of printf statements and file close statements must be checked. One can make a case, though, that if the response to an error would rightfully be no different than the response to success, there is little point in explicitly checking a return value.... In cases like these, it can be acceptable to explicitly cast the function return value to (void) – thereby indicating that the programmer explicitly and not accidentally decides to ignore a return value.

8. Rule: The use of the preprocessor must be limited to the inclusion of header files and simple macro definitions. Token pasting, variable argument lists (ellipses), and recursive macro calls are not allowed. All macros must expand into complete syntactic units. The use of conditional compilation directives is often also dubious, but cannot always be avoided. This means that there should rarely be justification for more than one or two conditional compilation directives even in large software development efforts, beyond the standard boilerplate that avoids multiple inclusion of the same header file. Each such use should be flagged by a tool-based checker and justified in the code.

Rationale: The C preprocessor is a powerful obfuscation tool that can destroy code clarity and befuddle many text based checkers....Note that with just ten conditional compilation directives, there could be up to 2^10 possible versions of the code, each of which would have to be tested– causing a huge increase in the required test effort.

9. Rule: The use of pointers should be restricted. Specifically, no more than one level of dereferencing is allowed. Pointer dereference operations may not be hidden in macro definitions or inside typedef declarations. Function pointers are not permitted.

Rationale: Pointers are easily misused, even by experienced programmers. They can make it hard to follow or analyze the flow of data in a program, especially by tool- based static analyzers. Function pointers, similarly, can seriously restrict the types of checks that can be performed by static analyzers and should only be used if there is a strong justification for their use, and ideally alternate means are provided to assist tool-based checkers determine flow of control and function call hierarchies. For instance, if function pointers are used, it can become impossible for a tool to prove absence of recursion, so alternate guarantees would have to be provided to make up for this loss in analytical capabilities.

10. Rule: All code must be compiled, from the first day of development, with all compiler warnings enabled at the compiler’s most pedantic setting. All code must compile with these setting without any warnings. All code must be checked daily with at least one, but preferably more than one, state-of-the-art static source code analyzer and should pass the analyses with zero warnings.

...

" -- http://spinroot.com/gerard/pdf/P10.pdf

GRASP

https://en.wikipedia.org/wiki/GRASP_%28object-oriented_design%29

Patterns:

design criteria for programs

software quality metrics

Links

todo

"Joshua Bloch, in his book “Effective Java” says, “Favor composition over inheritance”. " -- via http://www.smashcompany.com/technology/object-oriented-programming-is-an-expensive-disaster-which-must-end

Dependency injection

https://martinfowler.com/articles/injection.html talks about different mechanisms for dependency injection, and a similar pattern called service locator. Dependency injection is when you have some class A that calls (and possibly even constructs) class B, but you want to generalize it so that class A can use any class in place of class B; so what you do is you have class A take the class B as a parameter in some form or another, and then you have some other registry that provides the 'B' parameter value. In dependency injection the registry calls class A in order to 'inject' the 'B' parameter value. Since the registry calls class A (rather than class A calling the registry), then this is an instance of 'inversion of control'; so 'dependency injection' is a special case of 'inversion of control' in which the purpose of the inversion of control is to provide the 'B' parameter value (the 'dependency'). The methods described by the article are:

The article then compares these to service locators. With service locators, A has a reference to the registry and calls the registry to get B. An advantage of service locators is that they are conceptually simpler. A disadvantage is that, in order to see what dependencies A has, you have to look for all calls to the registry, rather than just looking at static metadata like constructor parameters/setters defined/interfaces implemented. Another disadvantage is that, with service locators, assuming that one big service locator is used for many services throughout a program, the service locator could be too 'fat' and hard to mock for testing of A, whereas with dependency injection the person writing A can control the complexity of everything needed for testing, although if the designer of the service locator pays attention to this it can be avoided.

Fowler prefers Service Registries because they are simpler, except when building classes to be used in multiple applications, in which case he prefers Dependency Injection. Within Dependency Injection, he prefers Constructor Injection, because it's simpler, but Setter Injection if there is some reason to use it.

Single Responsibility Principle

Single Responsibility Principle vs. encapsulation

" The Single Responsibility Principle (SRP) says that a class should have one, and only one, reason to change. To say this a different way, the methods of a class should change for the same reasons, they should not be affected by different forces that change at different rates.

As an example, imagine the following class in Java:

class Employee

{ public Money calculatePay() {…} public void save() {…} public String reportHours() {…} }

This class violates the SRP because it has three reasons to change. The first is the business rules having to do with calculating pay. The second is the database schema. The third is the format of the string that reports hours....Of course this seems to fly in the face of OO concepts since a good object should contain all the methods that manipulate it. " -- [6]

Abstract factory pattern

"The abstract factory pattern provides a way to encapsulate a group of individual factories that have a common theme without specifying their concrete classes...the client software creates a concrete implementation of the abstract factory and then uses the generic interface of the factory to create the concrete objects that are part of the theme. The client doesn't know (or care) which concrete objects it gets from each of these internal factories, since it uses only the generic interfaces of their products." -- [7]

Builder pattern

"...the intention of the builder pattern is to find a solution to the telescoping constructor anti-pattern[citation needed]. The telescoping constructor anti-pattern occurs when the increase of object constructor parameter combination leads to an exponential list of constructors. Instead of using numerous constructors, the builder pattern uses another object, a builder, that receives each initialization parameter step by step and then returns the resulting constructed object at once. " -- [8]

An example (taken, mostly quoted, from [9]):

We have a Car class. The problem is that a car has many options. The combination of each option would lead to a huge list of constructors for this class. So we will create a builder class, CarBuilder?. We will send to the CarBuilder? each car option step by step and then construct the final car with the right options:

class Car is
  Can have GPS and various numbers of seats.

class CarBuilder is
  method getResult() is
      output:  a Car with the right options
    Construct and return the car.

  method setSeats(number) is
      input:  the number of seats the car may have.
    Tell the builder the number of seats.

  method setGPS() is
    Make the builder remember that the car has a global positioning system.

  method unsetGPS() is
    Make the builder remember that the car does not have a global positioning system.

Construct a CarBuilder? called carBuilder:

carBuilder.setSeats(2)
carBuilder.unsetGPS()
car := carBuilder.getResult()

---

"The familiar three-tier application model — in which presentation, business logic, and persistence are separated..." -- Java Concurrency In Practice, as quoted by [10]

---

Command-Query Separation (CQS)

The Command-Query Separation pattern is when "every method should either be a command that performs an action, or a query that returns data to the caller, but not both. In other words, Asking a question should not change the answer. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects." [11]

---

" Logic after callback

Working on one of my Postgres-connected personal apps, I noticed a strange behavior when tests failed. After an initial test failure, the rest of the tests would each time out! It didn’t happen very often, because my tests didn’t fail that often, but it was there. And it was starting to get annoying. I decided to dig in.

My code was using the pg node module, so I delved into the node_modules directory and started adding logging. I discovered that pg needed to do some internal cleanup when a request was complete, which it did after calling the user-provided callback. So, when an exception was thrown, this code was skipped. Therefore pg was in a bad state, not ready for the next request. I submitted a pull request which was ultimately re-implemented and released in v1.0.2.

Get into the habit of calling callbacks as the last statement in your functions. It’s also a good idea to prefix it with a return statement too. Sometimes it can’t be on the last line, but it should always be the last thing. " -- [12]

---

" Immutability

Blockchains are immutable. And for quite some time, distributed systems have relied on immutability to eliminate anomalies. Log-structured file system, log-structured merge-trees, and Copy-On-Write are common patterns/tricks used in Distributed Systems to model immutable data structures. Blockchains handle transactions in a similar way to event sourcing, the common technique used in Distributed Computing to handle facts and actions. Instead of overwriting data, you create an append-only log of all facts/actions that ever happened.

Pat Helland described the importance of immutability in his popular paper Immutability Changes Everything... " -- [13]

---

 "
 Some languages, like Java and C++, offer explicit interface support. Python is not among them. It offers implied interfaces in places where other languages would use explicit interfaces. This has a variety of effects, good and bad.

In Python, what classes your object is derived from is not a part of your object's interface.

Every use of isinstance is a violation of this promise, large or small. Whenever isinstance is used, control flow forks; one type of object goes down one code path, and other types of object go down the other --- even if they implement the same interface!

Bjarne Stroustrup often cited concerns like these when defending C++'s decision not to provide isinstance. (Now, of course, with RTTI in the C++ standard, C++ does provide isinstance.)

Sometimes, of course, violating this promise is worth the payoffs --- isinstance, like goto, is not pure evil. But it is a trap for new programmers. Beware! Don't use isinstance unless you know what you're doing. It can make your code non-extensible and break it in strange ways down the line. " -- [14]

---

https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)

---

https://mostly-adequate.gitbooks.io/mostly-adequate-guide/

https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript

---

"...DRY (don't repeat yourself), YAGNI (ya ain't gonna need it), loose coupling high cohesion, the principle of least surprise, single responsibility, and so on. " [15]

---

https://github.com/fantasyland/fantasy-land

" (aka "Algebraic JavaScript? Specification")

This project specifies interoperability of common algebraic structures:

    Setoid
    Ord
    Semigroupoid
    Category
    Semigroup
    Monoid
    Group
    Filterable
    Functor
    Contravariant
    Apply
    Applicative
    Alt
    Plus
    Alternative
    Foldable
    Traversable
    Chain
    ChainRec
    Monad
    Extend
    Comonad
    Bifunctor
    Profunctor

"

---

https://github.com/getify/Functional-Light-JS/blob/master/README.md

---

concurrency

ACM Queue

Real-world Concurrency from The Concurrency Problem Vol. 6, No. 5 - September 2008 by Bryan Cantrill and Jeff Bonwick, Sun Microsystems

https://cacm.acm.org/magazines/2008/11/538-real-world-concurrency/fulltext https://www.cs.helsinki.fi/u/kerola/rio/papers/cantrill_bonwick_2008.pdf

---

https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell

---

protocols, formats, data types, data models

in protocols sometimes you have a distinction between a 'data type' and a 'format'. A data model is more abstract, so let's start with 'format'. A format (which is similar to a syntax or a data representation) is that way that you write down a data value. For example, JSON is a format, the idea of enclosing strings in double-quotes is a syntax, and the concept of zero-terminated "C-style" string is a data representation. A data type is a type of thing that a format/syntax/data representation is trying to represent; for example, "string" is a data type, and the double-quote syntax, the C-string representation, and the Pascal-string representation are all representations of that type. Similarly, a list can be stored as a linked list or as a contiguous array.

A 'data model' is a data type with some additional semantics defined. For example, the RDF data model specification tells you how to encode, for example, English sentences like "Lassila is the creator of the resource http://www.w3.org/Home/Lassila". The RDF view of the semantics of this sentence can be translated from an English representation to an XML representation.

Sometimes you start with a concrete thing and generalize it to a data model; for example, you might first learn about RDF in the guise of its XML format, and then abstract that to the RDF data model. This can be confusing because the data model is so abstract, and at first it seems hard to think about it divorced from any particular format. Another thing that can happen is that you first learn about a particular standard expressed in a particular format, and then you later generalize/separate out the format for use with other things; for example, you might first learn about URLs and then later consider generalizing the syntax of URLs for use in things that are not really 'locations'. This can be confusing because it seems hard at first to think about the syntax divorced from the semantics of data that you originally learned to use it for.

---

" Classes and Data Structures are opposites in at least three different ways.

    Classes make functions visible while keeping data implied. Data structures make data visible while keeping functions implied.
    Classes make it easy to add types but hard to add functions. Data structures make it easy to add functions but hard to add types.
    Data Structures expose callers to recompilation and redeployment. Classes isolate callers from recompilation and redeployment.

...

An object is a set of functions that operate upon encapsulated data elements.

    Or rather, an object is a set of functions that operate on implied data elements.

...

    Consider a set of object classes that all conform to a common interface. For example, imagine classes that represent two dimensional shapes that all have functions for calculating the area and perimeter of the shape.

...

    Let’s just consider two different types: Squares and Circles. It should be clear that the area and permimeter functions of these two classes operate on different implied data structures. It should also be clear that the way those operations are called is via dynamic polymorphism.

...

    There are two different area functions; one for Square, the other for Circle. When the caller invokes the area function on a particular object, it is that object that knows what function to call. We call that dynamic polymorphism.

...

    Now let’s turn those objects into data structures. We’ll use Discriminated Unions.

...

    Discriminated Unions. In our case that’s just two different data structures. One for Square and the other for Circle. The Circle data structure has a center point, and a radius for data elements. It’s also got a type code that identifies it as a Circle.

You mean like an enum?

    Sure. The Square data structure has the top left point, and the length of the side. It also has the type discriminator – the enum.

OK. Two data structures with a type code.

...

    Right. Now consider the area function. Its going to have a switch statement in it, isn’t it?

Um. Sure, for the two different cases. One for Square and the other for Circle. And the perimeter function will need a similar switch statement

...

     If you want to add the Triangle type to the object scenario, what code must change?

No code changes. You just create the new Triangle class. Oh, I suppose the creator of the instance has to be changed.

    Right. So when you add a new type, very little changes. Now suppose you want to add a new function - say the center function.

Well then you’d have to add that to all three types, Circle, Square ,and Triangle.

    Good. So adding new functions is hard, you have to change each class.

But with data structures it’s different. In order to add Triangle you have to change each function to add the Triangle case to the switch statements.

    Right. Adding new types is hard, you have to change each function.

But when you add the new center function, nothing has to change.

    Yup. Adding new functions is easy.

Wow. It’s the exact opposite.

    It certainly is. Let’s review:
        Adding new functions to a set of classes is hard, you have to change each class.
        Adding new functions to a set of data structures is easy, you just add the function, nothing else changes.
        Adding new types to a set of classes is easy, you just add the new class.
        Adding new types to a set of data structures is hard, you have to change each function.

Yeah. Opposites. Opposites in an interesting way. I mean, if you know that you are going to be adding new functions to a set of types, you’d want to use data structures. But if you know you are going to be adding new types then you want to use classes.

    Good observation! But there’s one last thing for us to consider today. There’s yet another way in which data structures and classes are opposites. It has to do with dependencies.

Dependencies?

    Yes, the direction of the source code dependencies.

OK, I’ll bite. What’s the difference?

    Consider the data structure case. Each function has a switch statement that selects the appropriate implementation based upon the type code within the discriminated union.

OK, that’s true. But so what?

    Consider a call to the area function. The caller depends upon the area function, and the area function depends upon every specific implementation.

What do you mean by “depends”?

    Imagine that each of the implementations of area is written into it’s own function. So there’s circleArea and squareArea and triangleArea.

OK, so the switch statement just calls those functions.

    Imagine those functions are in different source files.

Then the source file with the switch statement would have to import, or use, or include, all those source files.

    Right. That’s a source code dependency. One source file depends upon another source file. What is the direction of that dependency?

The source file with the switch statement depends upon the source files that contain all the implementations.

    And what about the caller of the area function?

The caller of the area function depends upon the source file with the switch statement which depends upon all the implementations.

    Correct. All the source file dependencies point in the direction of the call, from the caller to the implementation. So if you make a tiny change to one of those implementations…

OK, I see where you are going with this. A change to any one of the implementations will cause the source file with the switch statement to be recompiled, which will cause everyone who calls that switch statement – the area function in our case – to be recompiled.

...

this is reversed in the case of classes?

    Yes, because the caller of the area function depends upon an interface, and the implementation functions also depend upon that interface.

I see what you mean. The source file of the Square class imports, or uses, or includes the source file of the Shape interface.

    Right. The source files of the implementation point in the opposite direction of the call. They point from the implementation to the caller. At least that’s true for statically typed languages. For dynamically typed languages the caller of the area function depends upon nothing at all. The linkages get worked out at run time.

Right. OK. So if you make a change to one of the implementations…

    Only the changed file needs to be recompiled or redeployed.

And that’s because the dependencies between the source files point against the direction of the call.

    Right. We call that Dependency Inversion.

" -- [16]

" Jach 22 hours ago [-]

I guess this applies for Java and C++ style "classes". This does not precisely apply to the first ANSI-standardized OOP system, Common Lisp's. Standard classes do not own methods, instead methods are specializations of a generic function that stands alone and dispatches on the class types (or EQL values) of all its arguments.

I'd really like it if Uncle Bob eventually has his fill of Clojure and moves on to explore what Common Lisp built decades earlier, then blogs about that too.

reply

unscaled 15 hours ago [-]

It's not just CLOS. All general classifications like this are doomed to fail when you start looking through different languages.

I think one very clear example is hybrid sum types like Scala's case classes (on sealed trait/abstract class), Kotlin's sealed classes and probably Swift enums too. They can all be used as a pure sum-type, but they don't forego inheritance and polymorphism.

I think the more important distinction relies on how you use it, regardless of whether the language allows you to do more. Do you make the data layout a contract (in the case of sum types: commit to closed set of cases) and make changing the "schema" harder? Or do you make the provided set of functions a contract and make it harder to add new functionality that will be supported by all data types?

reply

dreamcompiler 15 hours ago [-]

Came here to say exactly this and you already did, so thanks. It's amazingly liberating to use a language where generic functions are first-class, and classes don't own any methods. Once you've written code this way, the other way seems backward and restrictive.

reply

StefanKarpinski? 15 hours ago [-]

Spot on. Multiple dispatch avoids the whole issue because methods are external and don't live inside of classes. Lisps, of course support multimethods, which is great. There are some down sides, though. They are opt-in (defmethod) and tend to have a significant performance hit associated with them. Someone needs to anticipate your need to add types and/or functions and think it's worse sacrificing performance for that ability.

Julia, builds on this tradition but allows you to have your cake and eat it too. It has multimethods/generic functions and they are the only option—all user defined functions are multimethods. They also have excellent performance (they're used for everything, they have to).

Of course, there's no free lunch and you do give up traditional separate compilation, but the degree composability it gives to the ecosystem is hard to comprehend without experiencing it. Simple, reusable data types are shared across the ecosystem with anyone adding whatever (external) methods they want. Generic code that handles a literally exponential explosion of argument types "just work"—and the compiler generates fast code. All without doing anything special, since multiple dispatch is the default and only way functions work.

reply

mrbrowning 18 hours ago [-]

There are substantial differences between Clojure's constellation of protocols/records/multimethods and CLOS, but at least the feature of CLOS that you cite is exactly what multimethods in Clojure do, see: https://clojure.org/reference/multimethods

reply " -- [17]

---