Table of Contents for Programming Languages: a survey


Because it is so well-known and well-liked, Python gets its own chapter.



Tours and tutorials:

Best practices and style guides:

Respected exemplar code:






Opinionated comparisons:

    class Foo:
      def __init__(self, *args, **kwargs):

And it will automatically assign any keyword arguments you use as attributes to the object. For example `foo = Foo(name='Bob', age=99)`. If you still want to keep a strict list of allowed attributes, you can define them as parameters, and use a shortcut to assign all local variables to attributes.

    class Foo:
      def __init__(self, name, age):
        del self.self" [37]



'Properties' (getters and setters)

e.g. (thx )

class Velocity(object): def __init__(self, x, y): self.x = x self.y = y @property def speed(self): return math.sqrt(self.x2 + self.y2) @speed.setter def speed(self, value): angle = math.atan2(self.x, self.y) self.x = math.sin(angle) * value self.y = math.cos(angle) * value @speed.deleter def speed(self): self.x = self.y = 0

Descriptor protocol

.__getattr__, .__dict__, etc (see )

see (about halfway down) for some short usage examples

Composition via getattr

eg (thx ):

class ComposedDoor?: def __init__(self, number, status): self.door = Door(number, status)

    def __getattr__(self, attr):
        return getattr(self.door, attr)




namedtuple idiom for concise constructors:

from collections import namedtuple

e.g. class Position(namedtuple('Position', 'board score wc bc ep kp')):"



Internals and implementations


Core data structures: todo


Python's grammar is LL(1).

Note that Python's grammar cannot be represented as an operator precedence grammar [46].

Number representations


Floating points todo

integers below 256 are implemented as singletons [47].

array representation

variable-length lists: todo

multidimensional arrays: todo

limits on sizes of the above

string representation

Python strings are immutable [48]

Python's string representation "support multiple internal representations, depending on the character with the largest Unicode ordinal (1, 2, or 4 bytes). This will allow a space-efficient representation in common cases, but give access to full UCS-4 on all systems.": [49] [50]

" Unicode structures are now defined as a hierarchy of structures, namely:

typedef struct { PyObject?_HEAD Py_ssize_t length; Py_hash_t hash; struct { unsigned int interned:2; unsigned int kind:2; unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; } state; wchar_t *wstr; } PyASCIIObject?;

typedef struct { PyASCIIObject? _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyCompactUnicodeObject?;

typedef struct { PyCompactUnicodeObject? _base; union { void *any; Py_UCS1 *latin1; Py_UCS2 *ucs2; Py_UCS4 *ucs4; } data; } PyUnicodeObject?;

" -- [51]

how strings used to be stored in older versions of Python:

Python string max length is given by Py_ssize_t [52] (which, interestingly, is signed, because many places in Python's implementation internally use negative lengths to denote counting from the end [53])

Representation of structures with fields


some examples in

Hashing and equality checks

Memory management

"There’s no chance that the reference count can overflow; at least as many bits are used to hold the reference count as there are distinct memory locations in virtual memory (assuming sizeof(Py_ssize_t) >= sizeof(void*)). Thus, the reference count increment is a simple operation." --

"Every Python object contains at least a refcount and a reference to the object's type in addition to other storage; on a 64-bit machine, that takes up 16 bytes" --

Examples of some Python objects and how much memory they take (in 64-bit Python 2.7x; from [54]):


Concurrency and the GIL

pypy-stm appears to be one of the latest attempts to remove the GIL from Python (Stackless didn't remove the GIL, it was about microtasks; PyPy? incorporated much of Stackless but also didn't remove the GIL).

Compiled format

Variable lookup details

"Python...maps variable names to integer indices during compilation to bytecode, and the bytecode just takes those embedded constant indices and indexes into an array to obtain a local variable's value. That's a lot faster." -- Gergo Barany

and metaprogramming interaction:

Q: "how does CPython resolve scope if it maps variable names to indices? In the case of `exec(input())` and say the input string is `x = 1`, how would it compile bytecode to allocate space for x and index into the value?" [56]

A: "In Python 2, the bytecode optimization that lets you access variables by index is turned off if your function has exec code in it that may modify locals.

So if your function is:

   def foo(): exec "a=1"; return a

Then running dis.dis on foo to disassemble the bytecode it you will see:

              8 LOAD_NAME                0 (a)

while you normally would see:

              6 LOAD_FAST                0 (a)

... " -- Erwin

if you try this, note the scoping details:

" speedster217 7 days ago [-]

I just tested it in Python3:

    def foo():
        return a

Fails with a NameError?:

    Traceback (most recent call last):
      File "", line 5, in <module>
      File "", line 3, in foo
        return a
    NameError: name 'a' is not defined


joncatanio 7 days ago [-]

I get the same error with your example, but this works fine (Python 3.6.4):

    exec("a = 1")

This will print "1".


yorwba 7 days ago [-]

That is because the exec runs in the global scope. When Python sees a variable that is not assigned to in the local scope, it is assumed to be a global variable, so when exec creates a new local variable, the load still fails because it looks into the globals dictionary.

But you can do this:

  def foo():
    exec("a = 1")
    return locals()['a']



Pythons Internals retrospectives and studies

" In this study we attempt to quantify the costs of language features such as dynamic typing, reference counting for memory management, boxing of numbers, and late binding of function calls


We find that a boxed representation of numbers as heap objects is the single most costly language feature on numeric codes, accounting for up to 43 % of total execution time in our benchmark set. On symbolic object-oriented code, late binding of function and method calls costs up to 30 %. Redundant reference counting, dynamic type checks, and Python’s elaborate function calling convention have comparatively smaller costs.


The optimizations performed by pylibjit on request aim directly at eliminating the sources of overheads discussed above.


Unboxing of numbers uses type annotations to directly use machine operations on integers and floating-point numbers that fit into machine registers. This avoids storing every number in a heap object and managing that memory. As a special case, this also turns loops of the very common form for i in range(n) , which would normally use an iterator to enumerate boxed numbers from 0 to n , into unboxed counting loops if i is an unboxed machine integer.


Early binding of function calls resolves the addresses of compiled or built-in functions or methods at compile time (guided by type annotations for methods’ receiver objects) and generates direct CPU call instructions. " -- Python Interpreter Performance Deconstructed by Gergö Barany

Pythons Internals links


Implementations and variants

Misc links