proj-plbook-plPartPractices

Table of Contents for Programming Languages: a survey

Part IV: Idioms, patterns, and best practices

DRY

YAGNI

visitor

strategy pattern

factory pattern

inversion of control

http://en.wikipedia.org/wiki/Inversion_of_control

http://en.wikipedia.org/wiki/Dependency_injection (a form of inversion of control)

http://en.wikipedia.org/wiki/Template:Design_Patterns_patterns

misc

two-way model binding

" ...the most important factor in making your code maintainable in the long run is breaking dependencies. If piece X depends on seven other pieces and isn't a trivial bit of wiring code, that's a smell. " -- Nick Stenning, http://list.hypothes.is/archive/dev/2014-09/

futureproofing patterns

(cross-reference extensibility)

http://journal.stuffwithstuff.com/2010/09/18/futureproofing-uniform-access-and-masquerades/

points out three kinds of futureproofing in java:

these are all annoying boilerplate, but they are all good things to do because if you don't, and you want to make one of the following changes, then you must change every call site, instead of just one line. This is especially bad if your program is a shipped library and the call sites are in client code (e.g. you would have to make breaking/incompatible changes to your library):

now, Jasper already deals with the first two of these, and maybe the third. If not, We should probably deal with the third, too. That is to say, when you call a constructor (if we have constructors at all; i'm leaning towards yes), you don't actually determine an implementation but merely an interface, and perhaps a factory method to determine the implementation. In other words, 'everything is an interface', like we always say.

in http://journal.stuffwithstuff.com/2010/10/21/the-language-i-wish-go-was/, he also points out a few more that Go handles that Java doesn't:

in http://journal.stuffwithstuff.com/2010/10/21/the-language-i-wish-go-was/, he also points others kinds of futureproofing that aren't needed in Java but that may be needed in other languages, such as Go:

book rec: design patterns

Chapter: Types of coding tasks

numeric linear algebra:

operator overloading is crucial

https://news.ycombinator.com/item?id=6284842

embedded:

memory footprint is crucial (often, GC isn't good enough, e.g. Objective-C)

hobby project:

little boilerplate; no need for team features

simulation: CPU-bound; speed; OOP; parallelism

Concurrency patterns

overuse of synchronization leads to throwing away a lot of concurrency. advanced algorithms such as lock-free algorithms accept/permit nondeterminacy as much as possible during their execution, but still give deterministic results at the end. but they are difficult to write/reason about.

if using basic locking, where to lock in data structures (e.g. to create critical sections): create a critical section out of any piece of code s.t. if you stopped in the middle, an invariant would be violated; or any piece of code s.t. if another thread updated shared memory in the middle, there would be a problem.

beware: returns, gotos, possible exceptions; anything that could interrupt you

beware: function calls that might call something else that would block on one of the locks you've acquired

POSIX, pthreads

safety patterns

http://spinroot.com/gerard/pdf/P10.pdf is a great read. it has 10 simple rules for safety:

" 1. Rule: Restrict all code to very simple control flow constructs – do not use goto statements, setjmp or longjmp constructs, and direct or indirect recursion.

Rationale: Simpler control flow translates into stronger capabilities for verification and often results in improved code clarity. The banishment of recursion is perhaps the biggest surprise here. Without recursion, though, we are guaranteed to have an acyclic function call graph, which can be exploited by code analyzers, and can directly help to prove that all executions that should be bounded are in fact bounded. (Note that this rule does not require that all functions have a single point of return – although this often also simplifies control flow. There are enough cases, though, where an early error return is the simpler solution.)

2. Rule: All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded...This rule does not, of course, apply to iterations that are meant to be non-terminating (e.g., in a process scheduler). In those special cases, the reverse rule is applied: it should be statically provable that the iteration cannot terminate.

...

3. Rule: Do not use dynamic memory allocation after initialization.

...

4. Rule: No function should be longer than what can be printed on a single sheet of paper in a standard reference format with one line per statement and one line per declaration. Typically, this means no more than about 60 lines of code per function.

5. Rule: The assertion density of the code should average to a minimum of two assertions per function. Assertions are used to check for anomalous conditions that should never happen in real-life executions. Assertions must always be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken, e.g., by returning an error condition to the caller of the function that executes the failing assertion.

...

A typical use of an assertion would be as follows:

if (!c_assert(p >= 0) == true) { return ERROR; }

with the assertion defined as follows:

  1. define c_assert(e) ((e) ? (true) : \ tst_debugging(”%s,%d: assertion ’%s’ failed\n”, \ __FILE__, __LINE__, #e), false)

In this definition, __FILE__ and __LINE__ are predefined by the macro preprocessor to produce the filename and line-number of the failing assertion. The syntax #e turns the assertion condition e into a string that is printed as part of the error message.

...

6. Rule: Data objects must be declared at the smallest possible level of scope.

Rationale: This rule supports a basic principle of data-hiding. Clearly if an object is not in scope, its value cannot be referenced or corrupted. Similarly, if an erroneous value of an object has to be diagnosed, the fewer the number of statements where the value could have been assigned; the easier it is to diagnose the problem....

7. Rule : The return value of non-void functions must be checked by each calling function, and the validity of parameters must be checked inside each function.

... In its strictest form, this rule means that even the return value of printf statements and file close statements must be checked. One can make a case, though, that if the response to an error would rightfully be no different than the response to success, there is little point in explicitly checking a return value.... In cases like these, it can be acceptable to explicitly cast the function return value to (void) – thereby indicating that the programmer explicitly and not accidentally decides to ignore a return value.

8. Rule: The use of the preprocessor must be limited to the inclusion of header files and simple macro definitions. Token pasting, variable argument lists (ellipses), and recursive macro calls are not allowed. All macros must expand into complete syntactic units. The use of conditional compilation directives is often also dubious, but cannot always be avoided. This means that there should rarely be justification for more than one or two conditional compilation directives even in large software development efforts, beyond the standard boilerplate that avoids multiple inclusion of the same header file. Each such use should be flagged by a tool-based checker and justified in the code.

Rationale: The C preprocessor is a powerful obfuscation tool that can destroy code clarity and befuddle many text based checkers....Note that with just ten conditional compilation directives, there could be up to 2^10 possible versions of the code, each of which would have to be tested– causing a huge increase in the required test effort.

9. Rule: The use of pointers should be restricted. Specifically, no more than one level of dereferencing is allowed. Pointer dereference operations may not be hidden in macro definitions or inside typedef declarations. Function pointers are not permitted.

Rationale: Pointers are easily misused, even by experienced programmers. They can make it hard to follow or analyze the flow of data in a program, especially by tool- based static analyzers. Function pointers, similarly, can seriously restrict the types of checks that can be performed by static analyzers and should only be used if there is a strong justification for their use, and ideally alternate means are provided to assist tool-based checkers determine flow of control and function call hierarchies. For instance, if function pointers are used, it can become impossible for a tool to prove absence of recursion, so alternate guarantees would have to be provided to make up for this loss in analytical capabilities.

10. Rule: All code must be compiled, from the first day of development, with all compiler warnings enabled at the compiler’s most pedantic setting. All code must compile with these setting without any warnings. All code must be checked daily with at least one, but preferably more than one, state-of-the-art static source code analyzer and should pass the analyses with zero warnings.

...

" -- http://spinroot.com/gerard/pdf/P10.pdf

GRASP

https://en.wikipedia.org/wiki/GRASP_%28object-oriented_design%29

Patterns:

design criteria for programs

software quality metrics

Links

todo

"Joshua Bloch, in his book “Effective Java” says, “Favor composition over inheritance”. " -- via http://www.smashcompany.com/technology/object-oriented-programming-is-an-expensive-disaster-which-must-end

Dependency injection

https://martinfowler.com/articles/injection.html talks about different mechanisms for dependency injection, and a similar pattern called service locator. Dependency injection is when you have some class A that calls (and possibly even constructs) class B, but you want to generalize it so that class A can use any class in place of class B; so what you do is you have class A take the class B as a parameter in some form or another, and then you have some other registry that provides the 'B' parameter value. In dependency injection the registry calls class A in order to 'inject' the 'B' parameter value. Since the registry calls class A (rather than class A calling the registry), then this is an instance of 'inversion of control'; so 'dependency injection' is a special case of 'inversion of control' in which the purpose of the inversion of control is to provide the 'B' parameter value (the 'dependency'). The methods described by the article are:

The article then compares these to service locators. With service locators, A has a reference to the registry and calls the registry to get B. An advantage of service locators is that they are conceptually simpler. A disadvantage is that, in order to see what dependencies A has, you have to look for all calls to the registry, rather than just looking at static metadata like constructor parameters/setters defined/interfaces implemented. Another disadvantage is that, with service locators, assuming that one big service locator is used for many services throughout a program, the service locator could be too 'fat' and hard to mock for testing of A, whereas with dependency injection the person writing A can control the complexity of everything needed for testing, although if the designer of the service locator pays attention to this it can be avoided.

Fowler prefers Service Registries because they are simpler, except when building classes to be used in multiple applications, in which case he prefers Dependency Injection. Within Dependency Injection, he prefers Constructor Injection, because it's simpler, but Setter Injection if there is some reason to use it.

Single Responsibility Principle

Single Responsibility Principle vs. encapsulation

" The Single Responsibility Principle (SRP) says that a class should have one, and only one, reason to change. To say this a different way, the methods of a class should change for the same reasons, they should not be affected by different forces that change at different rates.

As an example, imagine the following class in Java:

class Employee

{ public Money calculatePay() {…} public void save() {…} public String reportHours() {…} }

This class violates the SRP because it has three reasons to change. The first is the business rules having to do with calculating pay. The second is the database schema. The third is the format of the string that reports hours....Of course this seems to fly in the face of OO concepts since a good object should contain all the methods that manipulate it. " -- [6]

Abstract factory pattern

"The abstract factory pattern provides a way to encapsulate a group of individual factories that have a common theme without specifying their concrete classes...the client software creates a concrete implementation of the abstract factory and then uses the generic interface of the factory to create the concrete objects that are part of the theme. The client doesn't know (or care) which concrete objects it gets from each of these internal factories, since it uses only the generic interfaces of their products." -- [7]

Builder pattern

"...the intention of the builder pattern is to find a solution to the telescoping constructor anti-pattern[citation needed]. The telescoping constructor anti-pattern occurs when the increase of object constructor parameter combination leads to an exponential list of constructors. Instead of using numerous constructors, the builder pattern uses another object, a builder, that receives each initialization parameter step by step and then returns the resulting constructed object at once. " -- [8]

An example (taken, mostly quoted, from [9]):

We have a Car class. The problem is that a car has many options. The combination of each option would lead to a huge list of constructors for this class. So we will create a builder class, CarBuilder?. We will send to the CarBuilder? each car option step by step and then construct the final car with the right options:

class Car is
  Can have GPS and various numbers of seats.

class CarBuilder is
  method getResult() is
      output:  a Car with the right options
    Construct and return the car.

  method setSeats(number) is
      input:  the number of seats the car may have.
    Tell the builder the number of seats.

  method setGPS() is
    Make the builder remember that the car has a global positioning system.

  method unsetGPS() is
    Make the builder remember that the car does not have a global positioning system.

Construct a CarBuilder? called carBuilder:

carBuilder.setSeats(2)
carBuilder.unsetGPS()
car := carBuilder.getResult()

---

"The familiar three-tier application model — in which presentation, business logic, and persistence are separated..." -- Java Concurrency In Practice, as quoted by [10]

---

Command-Query Separation (CQS)

The Command-Query Separation pattern is when "every method should either be a command that performs an action, or a query that returns data to the caller, but not both. In other words, Asking a question should not change the answer. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects." [11]

---

" Logic after callback

Working on one of my Postgres-connected personal apps, I noticed a strange behavior when tests failed. After an initial test failure, the rest of the tests would each time out! It didn’t happen very often, because my tests didn’t fail that often, but it was there. And it was starting to get annoying. I decided to dig in.

My code was using the pg node module, so I delved into the node_modules directory and started adding logging. I discovered that pg needed to do some internal cleanup when a request was complete, which it did after calling the user-provided callback. So, when an exception was thrown, this code was skipped. Therefore pg was in a bad state, not ready for the next request. I submitted a pull request which was ultimately re-implemented and released in v1.0.2.

Get into the habit of calling callbacks as the last statement in your functions. It’s also a good idea to prefix it with a return statement too. Sometimes it can’t be on the last line, but it should always be the last thing. " -- [12]

---

" Immutability

Blockchains are immutable. And for quite some time, distributed systems have relied on immutability to eliminate anomalies. Log-structured file system, log-structured merge-trees, and Copy-On-Write are common patterns/tricks used in Distributed Systems to model immutable data structures. Blockchains handle transactions in a similar way to event sourcing, the common technique used in Distributed Computing to handle facts and actions. Instead of overwriting data, you create an append-only log of all facts/actions that ever happened.

Pat Helland described the importance of immutability in his popular paper Immutability Changes Everything... " -- [13]

---

 "
 Some languages, like Java and C++, offer explicit interface support. Python is not among them. It offers implied interfaces in places where other languages would use explicit interfaces. This has a variety of effects, good and bad.

In Python, what classes your object is derived from is not a part of your object's interface.

Every use of isinstance is a violation of this promise, large or small. Whenever isinstance is used, control flow forks; one type of object goes down one code path, and other types of object go down the other --- even if they implement the same interface!

Bjarne Stroustrup often cited concerns like these when defending C++'s decision not to provide isinstance. (Now, of course, with RTTI in the C++ standard, C++ does provide isinstance.)

Sometimes, of course, violating this promise is worth the payoffs --- isinstance, like goto, is not pure evil. But it is a trap for new programmers. Beware! Don't use isinstance unless you know what you're doing. It can make your code non-extensible and break it in strange ways down the line. " -- [14]

---

https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)

---

https://mostly-adequate.gitbooks.io/mostly-adequate-guide/

https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript

---

"...DRY (don't repeat yourself), YAGNI (ya ain't gonna need it), loose coupling high cohesion, the principle of least surprise, single responsibility, and so on. " [15]

---

https://github.com/fantasyland/fantasy-land

" (aka "Algebraic JavaScript? Specification")

This project specifies interoperability of common algebraic structures:

    Setoid
    Ord
    Semigroupoid
    Category
    Semigroup
    Monoid
    Group
    Filterable
    Functor
    Contravariant
    Apply
    Applicative
    Alt
    Plus
    Alternative
    Foldable
    Traversable
    Chain
    ChainRec
    Monad
    Extend
    Comonad
    Bifunctor
    Profunctor

"

---

https://github.com/getify/Functional-Light-JS/blob/master/README.md

---

concurrency

ACM Queue

Real-world Concurrency from The Concurrency Problem Vol. 6, No. 5 - September 2008 by Bryan Cantrill and Jeff Bonwick, Sun Microsystems

https://cacm.acm.org/magazines/2008/11/538-real-world-concurrency/fulltext https://www.cs.helsinki.fi/u/kerola/rio/papers/cantrill_bonwick_2008.pdf

---

https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell

---

protocols, formats, data types, data models

in protocols sometimes you have a distinction between a 'data type' and a 'format'. A data model is more abstract, so let's start with 'format'. A format (which is similar to a syntax or a data representation) is that way that you write down a data value. For example, JSON is a format, the idea of enclosing strings in double-quotes is a syntax, and the concept of zero-terminated "C-style" string is a data representation. A data type is a type of thing that a format/syntax/data representation is trying to represent; for example, "string" is a data type, and the double-quote syntax, the C-string representation, and the Pascal-string representation are all representations of that type. Similarly, a list can be stored as a linked list or as a contiguous array.

A 'data model' is a data type with some additional semantics defined. For example, the RDF data model specification tells you how to encode, for example, English sentences like "Lassila is the creator of the resource http://www.w3.org/Home/Lassila". The RDF view of the semantics of this sentence can be translated from an English representation to an XML representation.

Sometimes you start with a concrete thing and generalize it to a data model; for example, you might first learn about RDF in the guise of its XML format, and then abstract that to the RDF data model. This can be confusing because the data model is so abstract, and at first it seems hard to think about it divorced from any particular format. Another thing that can happen is that you first learn about a particular standard expressed in a particular format, and then you later generalize/separate out the format for use with other things; for example, you might first learn about URLs and then later consider generalizing the syntax of URLs for use in things that are not really 'locations'. This can be confusing because it seems hard at first to think about the syntax divorced from the semantics of data that you originally learned to use it for.

---