proj-plbook-plChPythonLang

Difference between revision 106 and current revision

No diff available.

Table of Contents for Programming Languages: a survey

Python

Because it is so well-known and well-liked, Python gets its own chapter.

Pros:

Cons:

Features:

Tours and tutorials:

Best practices and style guides:

Respected exemplar code:

Gotchas:

Popularity:

Retrospectives:

Tools:

Opinions:

Python Opinionated comparisons

    class Foo:
      def __init__(self, *args, **kwargs):
        self.__dict__.update(**kwargs)

And it will automatically assign any keyword arguments you use as attributes to the object. For example `foo = Foo(name='Bob', age=99)`. If you still want to keep a strict list of allowed attributes, you can define them as parameters, and use a shortcut to assign all local variables to attributes.

    class Foo:
      def __init__(self, name, age):
        self.__dict__.update(locals())
        del self.self" [45]
"There may also have been implementation considerations by the way. I believe that historically at least, Ruby has been a lot slower than Python, although I think the differences today aren't very large. And a lot of these tools are written as native extensions, which Python seems to support a bit better." [56] "I think people settled on python because for a spell it was really hard to install ruby (1.6-1.8), it got worse around 2.0, (by the time python also ran into these issues, it had way more momentum) and:
m... }), and the correct syntax is of course str.sub!('pat') {m... }. Kinda silly I guess and a bit of a "oh, du'h" moment; I spent two years with Ruby as my day job, but that was also five years ago, and I think these kind of mistakes aren't that unusual if you're not programming Ruby too often. And then there are caveats like using "return" where you should have used "next", and quite a few other things. Don't get me wrong, I like Ruby, and there's also a bunch of things I don't like about Python. But I can definitely understand why people settled on Python. I think Matz once said "Ruby is 80% Perl and 20% Python"; I feel that with a slightly lower Perl-to-Python-ratio Ruby would have been better." [55]

Wishes

Features

'Properties' (getters and setters)

e.g. (thx http://www.ianbicking.org/property-decorator.html )

class Velocity(object): def __init__(self, x, y): self.x = x self.y = y @property def speed(self): return math.sqrt(self.x2 + self.y2) @speed.setter def speed(self, value): angle = math.atan2(self.x, self.y) self.x = math.sin(angle) * value self.y = math.cos(angle) * value @speed.deleter def speed(self): self.x = self.y = 0

Descriptor protocol

.__getattr__, .__dict__, etc (see https://docs.python.org/3.5/howto/descriptor.html )

see https://iluxonchik.github.io/why-you-should-learn-python/ (about halfway down) for some short usage examples

Composition via getattr

eg (thx http://lgiordani.com/blog/2014/08/20/python-3-oop-part-3-delegation-composition-and-inheritance/ ):

class ComposedDoor?: def __init__(self, number, status): self.door = Door(number, status)

    def __getattr__(self, attr):
        return getattr(self.door, attr)

Super

https://docs.python.org/2/library/functions.html#super

http://rhettinger.wordpress.com/2011/05/26/super-considered-super/

Multiprocessing

http://www.jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/

Imports

https://tenthousandmeters.com/blog/python-behind-the-scenes-11-how-the-python-import-system-works/

discussion: https://news.ycombinator.com/item?id=27941208

Practices

namedtuple idiom for concise constructors:

from collections import namedtuple

e.g. class Position(namedtuple('Position', 'board score wc bc ep kp')):"

Interop

Opinions

Internals and implementations

Links:

Core data structures: todo

Grammar

https://docs.python.org/3/reference/grammar.html

Python's grammar is LL(1).

Note that Python's grammar cannot be represented as an operator precedence grammar [74].

Number representations

Integers

Floating points todo

integers below 256 are implemented as singletons [75].

array representation

variable-length lists: todo

multidimensional arrays: todo

limits on sizes of the above

string representation

Python strings are immutable [76]

Python's string representation "support multiple internal representations, depending on the character with the largest Unicode ordinal (1, 2, or 4 bytes). This will allow a space-efficient representation in common cases, but give access to full UCS-4 on all systems.": [77] [78]

" Unicode structures are now defined as a hierarchy of structures, namely:

typedef struct { PyObject?_HEAD Py_ssize_t length; Py_hash_t hash; struct { unsigned int interned:2; unsigned int kind:2; unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; } state; wchar_t *wstr; } PyASCIIObject?;

typedef struct { PyASCIIObject? _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyCompactUnicodeObject?;

typedef struct { PyCompactUnicodeObject? _base; union { void *any; Py_UCS1 *latin1; Py_UCS2 *ucs2; Py_UCS4 *ucs4; } data; } PyUnicodeObject?;

" -- [79]

how strings used to be stored in older versions of Python: http://www.laurentluce.com/posts/python-string-objects-implementation/

Python string max length is given by Py_ssize_t [80] (which, interestingly, is signed, because many places in Python's implementation internally use negative lengths to denote counting from the end [81])

Representation of structures with fields

todo

some examples in https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/

Hashing and equality checks

Memory management

"There’s no chance that the reference count can overflow; at least as many bits are used to hold the reference count as there are distinct memory locations in virtual memory (assuming sizeof(Py_ssize_t) >= sizeof(void*)). Thus, the reference count increment is a simple operation." -- https://docs.python.org/2/c-api/intro.html

"Every Python object contains at least a refcount and a reference to the object's type in addition to other storage; on a 64-bit machine, that takes up 16 bytes" -- http://stackoverflow.com/questions/10365624/sys-getsizeofint-returns-an-unreasonably-large-value/10365639#10365639

Examples of some Python objects and how much memory they take (in 64-bit Python 2.7x; from [82]):

Links:

Concurrency and the GIL

pypy-stm appears to be one of the latest attempts to remove the GIL from Python (Stackless didn't remove the GIL, it was about microtasks; PyPy? incorporated much of Stackless but also didn't remove the GIL).

Compiled format

Variable lookup details

"Python...maps variable names to integer indices during compilation to bytecode, and the bytecode just takes those embedded constant indices and indexes into an array to obtain a local variable's value. That's a lot faster." -- Gergo Barany

and metaprogramming interaction:

Q: "how does CPython resolve scope if it maps variable names to indices? In the case of `exec(input())` and say the input string is `x = 1`, how would it compile bytecode to allocate space for x and index into the value?" [85]

A: "In Python 2, the bytecode optimization that lets you access variables by index is turned off if your function has exec code in it that may modify locals.

So if your function is:

   def foo(): exec "a=1"; return a

Then running dis.dis on foo to disassemble the bytecode it you will see:

              8 LOAD_NAME                0 (a)

while you normally would see:

              6 LOAD_FAST                0 (a)

... " -- Erwin

if you try this, note the scoping details:

" speedster217 7 days ago [-]

I just tested it in Python3:

    def foo():
        exec("a=1")
        return a
    print(foo())

Fails with a NameError?:

    Traceback (most recent call last):
      File "test.py", line 5, in <module>
        print(foo())
      File "test.py", line 3, in foo
        return a
    NameError: name 'a' is not defined

reply

joncatanio 7 days ago [-]

I get the same error with your example, but this works fine (Python 3.6.4):

    exec("a = 1")
    print(a)

This will print "1".

reply

yorwba 7 days ago [-]

That is because the exec runs in the global scope. When Python sees a variable that is not assigned to in the local scope, it is assumed to be a global variable, so when exec creates a new local variable, the load still fails because it looks into the globals dictionary.

But you can do this:

  def foo():
    exec("a = 1")
    return locals()['a']

reply

"

Pythons Internals retrospectives and studies

" In this study we attempt to quantify the costs of language features such as dynamic typing, reference counting for memory management, boxing of numbers, and late binding of function calls

...

We find that a boxed representation of numbers as heap objects is the single most costly language feature on numeric codes, accounting for up to 43 % of total execution time in our benchmark set. On symbolic object-oriented code, late binding of function and method calls costs up to 30 %. Redundant reference counting, dynamic type checks, and Python’s elaborate function calling convention have comparatively smaller costs.

...

The optimizations performed by pylibjit on request aim directly at eliminating the sources of overheads discussed above.

...

Unboxing of numbers uses type annotations to directly use machine operations on integers and floating-point numbers that fit into machine registers. This avoids storing every number in a heap object and managing that memory. As a special case, this also turns loops of the very common form for i in range(n) , which would normally use an iterator to enumerate boxed numbers from 0 to n , into unboxed counting loops if i is an unboxed machine integer.

...

Early binding of function calls resolves the addresses of compiled or built-in functions or methods at compile time (guided by type annotations for methods’ receiver objects) and generates direct CPU call instructions. " -- Python Interpreter Performance Deconstructed by Gergö Barany

Some issues with the CPython C API

Pythons Internals links

Extensions

Implementations and variants

Implementation: Cython

"I use it for some things and straight C++ for others. Cython is good for simple things, like small tight loops. One thing that is much more reliable in pure C++ is any kind of threading or OpenMP? etc. Theoretically Cython has it, in practice it can cause very weird problems. Also if you want to use C (or C++) libs in Cython, you have to manually declare all the function prototypes before using them. Also Cython has a tendency to make minor breaking changes to syntax every version, breaking my code and/or making it version-dependent. Since I distribute to others, the best method for linking performance intensive-code to Python I have found is: 1. Write the core in C++ 2. extern "C" a simple C API for it 3. Compile both to a SO 4. Access the SO from ctypes. It is more robust than Cython IME. Another advantage of this approach is that you have now have a generic C library you can use anywhere else if you want, not just Python. And you don't have to link -lpython so the C lib can be used in places where Python isn't installed. Finally, it can be nice to have some C++ binaries for startup time, and AFAIK you can't compile binaries with setuptools." [86]

Lists of Implementations and variants

Taichi variant / DSL / extension / language

https://www.taichi-lang.org/

https://docs.taichi-lang.org/blog/accelerate-python-code-100x

concurrency / GPU-focused numerics extension

PyPy implementation

https://pypy.org/

Discussion on incremental garbage collection (GC) in PyPy?:

Discussion on PyPy?-CPython extension interop:

PyPyJS

http://pypyjs.org/

Stackless implementation

Stackless (see also [87], [88], [89] [90])

TinyPy variant

tinypy (about 80k of code)

PyMite (embedded) variant

PyMite

MicroPython (embedded) variant

http://micropython.org/

Brython variant

https://brython.info/

"A Python 3 implementation for client-side web programming...Brython is designed to replace Javascript as the scripting language for the Web"

"It renders quite well a lot of Python features. Even things like yield from, sets, generator.send(), dict.fromkeys, multiple inheritance and 0b1100101. But there are several problems with brython: - it can't render some obscure Python features. It's very hard to do a complete Python implementation. E.g: metaclass will fail." [91]

some discussion here:

RustPython variant

https://github.com/RustPython/RustPython

Chocopy variant

https://chocopy.org/

MyPy variant

 https://github.com/python/mypy/tree/master/mypyc

Skylark variant

https://docs.bazel.build/versions/master/skylark/language.html (was: https://docs.bazel.build/versions/master/skylark/language.html )

https://github.com/bazelbuild/starlark

Snek variant

Snek https://sneklang.org/doc/snek.html

"Snek is a tiny embeddable language targeting processors with only a few kB of flash and ram. Think of something that would have been running BASIC years ago and you'll have the idea. These processors are too small to run MicroPython?. Snek borrows semantics and syntax from python, but only provides a tiny subset of that large language. The goal is to have Snek programs able to run in a full Python (version 3) implementation so that any knowledge gained in learning Snek will transfer directly to learning Python."

MyCpp variant

https://github.com/oilshell/oil/blob/master/mycpp/README.md "A tool that translates a subset of statically-typed Python to C++." "an experimental Python-to-C++ translator based on MyPy?. It only handles the small subset of Python that Oil uses."

Mycpp

Shedskin variant

http://shedskin.github.io/

    eval, getattr, hasattr, isinstance, anything really dynamic
    arbitrary-size arithmetic (integers become 32-bit (signed) by default on most architectures, see Command-line options)
    argument (un)packing (*args and **kwargs)
    multiple inheritance
    nested functions and classes
    unicode
    inheritance from builtins (excluding Exception and object)
    overloading __iter__, __call__, __del__
    closures

Some other features are currently only partially supported:

    class attributes must always be accessed using a class identifier:

self.class_attr # bad SomeClass?.class_attr # good SomeClass?.some_static_method() # good

    function references can be passed around, but not method references or class references, and they cannot be contained:
    ..."

Please BUILD variant

Please BUILD

form)."

Pycopy variant

https://github.com/pfalcon/pycopy

https://github.com/pycopy/PycoEPs/blob/master/StrictMode.md

py2many transpiler variant

https://github.com/adsharma/py2many/blob/main/doc/langspec.md

Kuroko variant

https://github.com/kuroko-lang/kuroko

"Dialect of Python with explicit variable declaration and block scoping, with a lightweight and easy-to-embed bytecode compiler and interpreter." [92]

"Importantly, it avoids a few gotchas in Python such as default parameters, and scoping rules, but stays generally compatible with Python." [93]

Oil's old OVM

Oil used to reuse a subset of the Python virtual machine implementation. These pages indicate what is in the subset:

context: https://lobste.rs/s/p8kn8u/how_python_was_shaped_by_leaky_internals#c_nroxen

Violet (Python in Swift implementation)

https://forums.swift.org/t/violet-python-vm-written-in-swift/56945

discussion:

CPython compiled to webassembly (wasm) implementations

some discussion here:

Cannoli

https://github.com/joncatanio/cannoli

"Cannoli is a compiler for a subset of Python 3.6.5 and is designed to evaluate the language features of Python that negatively impact performance. ... Cannoli supports a subset of Python 3.6.5, its current state omits many features that could not be completed during the duration of the thesis. The main omissions are exceptions and inheritance. ... Cannoli supports two major optimizations that come as a result of applying restrictions to the language. Restrictions are placed on the Python features that provide the ability to delete or inject scope elements and the ability to mutate the structure of objects and classes at run time. "

https://digitalcommons.calpoly.edu/theses/1886/

https://news.ycombinator.com/item?id=17093051

Minimal Viable Python variant

https://snarky.ca/mvpy-minimum-viable-python/

A core language of 15 constructs that the rest of Python can be lowered to (according to this author).

S6 optimized implementation

https://github.com/deepmind/s6

Mojo variant

https://www.modular.com/mojo

Misc links