Table of Contents for Programming Languages: a survey
Python
Because it is so well-known and well-liked, Python gets its own chapter.
Pros:
- Readable
- "There's only one obvious way to do it" (TOOWTDI)
- pervasive data structure protocols: sequences and map ('dict')
Cons:
- GIL
- "Anything to do with package distribution... There are problems with version skew and dependencies that just make for an "endless mess". He dreads it when a colleague comes to him with a "simple Python question". Half the time it is some kind of import path problem and there is no easy solution to offer." -- summary of the Python founder, Guido van Rossum's answer to the question "what he hates in Python" [1]
Features:
Tours and tutorials:
Best practices and style guides:
Respected exemplar code:
Gotchas:
- http://www.toptal.com/python/top-10-mistakes-that-python-programmers-make
- http://lignos.org/py_antipatterns/
- in Python 3 (but not Python 2), 'type('a'[0]) == type('a') == str', and strings are iterables, so if you try to recursive on a tree represented as nested lists with string leaves by checking for iterable-ness and iterating them if so, you'll go into an infinite loop, because a 1-character string is iterable, but the first element in it is another 1-character string...
- "I find the order of iteration in Python for comprehensions totally confusing every time. If you want to convert: for a in x: pass into a comprehension, you write: [pass for a in x] Simple, right? So to convert for a in x: for b in y: pass I obviously write: [pass for b in y for a in x] Bzzt. Nope, you have to write it in middle-endian order to get the same behaviour." [18]
- "One of the biggest design mistakes of iterators in Python is that StopIteration? bubbles if not caught. This can cause very frustrating problems where an exception somewhere can cause a generator or coroutine elsewhere to abort." [19]
- "...many Python modules include initialization functions that run during the import. You don't know what's running, you don't know what it does, and you might not notice. Unless there's a namespace conflict, in which case you get to spend many fun hours tracking down the cause." [20]
- If you assign a reference value into a new variable, it doesn't create a new copy of the value (because it's a reference type, not a value type) [21]
- True is 1 and False is 0. So if you do dict_a[1] = "apple"; dict_a[True] = "mango", the second assignment overwrites the first. [22]
- list.append, list.sort, dict.update, and others update in-place, and don't return the resulting container. So e.g. list_a = list_a.append(6) sets list_a to None. [23]
- in some cases, strings are interned, so two 'different' strings with the same value are actually the same (and this is exposed via the 'is' operator) [24]
- default arguments are evaluated only once. So if you have 'def func(a, lst=[])' and then within this function you call list.append(), you mutate the one copy of lst, and the next time this function is called, lst will no longer be [] [25]
- in at least some cases, variables captured by closures must be marked by the 'nonlocal' keyword (see e.g. [26])
- >>> powers_of_x = [lambda x: x^i for i in range(10)] >>> [f(2) for f in powers_of_x] [512, 512, 512, 512, 512, 512, 512, 512, 512, 512] [27]
- "This PEP ( https://www.python.org/dev/peps/pep-0292/ ) tries to replace % as an operator, because it should not be used for string formatting. " -- [28]
- https://github.com/satwikkansal/wtfpython
- https://stackoverflow.com/questions/17202207/why-does-true-false-is-false-evaluate-to-false
- "Once I wrapped my head around when you can and can't use relative imports, I've been pretty ok with them. The think that irks me is whether they work changes based on where you've invoked Python from. `./bin/my_script.py` behaves differently from `./my_script.py`."
- using a mutable default value [29]
- https://codereviewdoctor.medium.com/5-of-666-python-repos-had-comma-typos-including-tensorflow-and-pytorch-sentry-and-v8-7bc3ad9a1bb7
- https://github.com/satwikkansal/wtfpython
Popularity:
Retrospectives:
Tools:
Opinions:
- "steady and reliable workhorse for years..conservative language..Python isn't going to get super performant..doesn't currently deal with the modern web (async, concurrency) as cleanly as I'd like" -- [31]
- "I've always found the def __init__ method with 4 (!) underscores one of the ugliest things I've seen in a programming language." [32] (but see [33])
- "Take something like __str__ and __repr__ combined with str(), repr(), vars(), dir(), print(), pprint.pprint()... I have no idea how anyone ever thought it was a good idea to have that many ways to output the content of a variable. Even after all the time I used python I never managed to find a consistent way to output variable values." [34]
- " A lot of people like the Python syntax. It has to be about the most readable of all computer languages. " [35]
- "Python really has more than just batteries included, it's probably the language with the most available libraries after C++." jonathanstrange
- but pjmlp replies "Java followed by .NET, beats both reagarding available libraries, unless you mean only open source ones. And JavaScript? with their one liner packages probably beats all." pjmlp
- http://xahlee.info/comp/python_problems.html
- complaints and suggestions about Python's import system: https://news.ycombinator.com/item?id=24252425
- "Ugh... The type hierarchy of Python is painfully intertwined with the implementation of its type system. That implementation varies by Python implementation, and since the Python language is as old as Visual Basic, Borland Pascal, Werther's caramels, and trilobites, it varies dramatically over time. Add to that the fact that Python has a philosophy of wearing leaky abstractions proudly on its sleeve and barfing implementation-level details at its users. Python really is one of the most complex, arbitrary, and poorly abstracted languages in popular use. I have no idea how anyone lives with it for anything except writing extremely repetitive, formulaic, and superficial code, because having to dig into its guts feels like a nightmare. Even Java's type system, as antiquated and simplistic as it is, is easier to understand and work with by comparison. " -- [36]
- "in order to do package management, you have to create a fake python installation and bend PYTHONPATH. Virtualenvs are the canonical way to do it, but to me it feels like a hack - the core language seems to wants all packages installed in /usr. So now I have all these virtualenvs lying around and they are detached from the scripts."
- "Python import system is by far the worst one I dealt with. Using Setup.py and regular or namespace packages, relative import, having complex sub packages and cross importing, running a script from somewhere inside one of your sub packages, and many more craps like these. Import system must be intuitive and easy to use! " [37]
- " Yeah it really tripped me up as a beginner. I think the hardest part to get used to was that the import syntax actually changes based on how, and from where, you are running your code. So depending on how you call your code, your imports might or might not work. This is ESPECIALLY painful when you are building a distribution. There is no syntax that works for all situations, which seems like it would be pretty important for an import system. I had to bookmark this tab, and still refer to it often." [38]
- "Problems are installing libraries and actually distributing your code so others can use it" -- [39]
- "...I...want a do-while" [40]
- https://news.ycombinator.com/item?id=20672051
- "...I’ve never found myself needing lambdas in Python. The statement-oriented nature of the language and the ability to write a function definition basically anywhere means that any place I would reach for a lambda in another language (filtering over a list), I just give the “anonymous function” a name and then use that name immediately below. It’s “hacky” or “uglier”, but it’s also immediately obvious what’s happening and not any slower." -- [41]
- https://avi.im/blag/2023/refactoring-python/ (this is really a complaint about dynamic typing)
- https://news.ycombinator.com/item?id=36018621 (comment on https://kobzol.github.io/rust/python/2023/05/20/writing-python-like-its-rust.html )
- https://lobste.rs/s/dk82je/writing_python_like_it_s_rust (another comment on https://kobzol.github.io/rust/python/2023/05/20/writing-python-like-its-rust.html )
- https://jeffknupp.com/blog/2017/09/15/python-is-the-fastest-growing-programming-language-due-to-a-feature-youve-never-heard-of/
- https://www.linuxjournal.com/article/3882
- https://news.ycombinator.com/item?id=37317590
Python Opinionated comparisons
- "I prefer the more talkative/english style of ruby and the use of functions over list and dict comprehension." "I find chained enumerator / block style code much easier to write and understand than list / dict comprehensions too. It's probably the single biggest thing that I don't like about Python; lambdas are simply too awkward, but lambdas + monadic transformations are probably my primary mental abstractions for writing software these days." [42] [43]
- "...there doesn't seem to be built-in support in Python for having attributes initialized by the the constructor. For example in Perl 6, I can write class Point {has $.x; has $.y;} ... and I get a constructor Pair.new(x => 1, y => 42) for free, no need to write custom initializers." [44] (but "Nothing built in, and generally not recommended, but you can do it! If you want to be really flexible and just allow any attribute, you can do this:
class Foo:
def __init__(self, *args, **kwargs):
self.__dict__.update(**kwargs)
And it will automatically assign any keyword arguments you use as attributes to the object. For example `foo = Foo(name='Bob', age=99)`. If you still want to keep a strict list of allowed attributes, you can define them as parameters, and use a shortcut to assign all local variables to attributes.
class Foo:
def __init__(self, name, age):
self.__dict__.update(locals())
del self.self" [45]
- vs. Java: "I felt I had been wasting my time writing XMLs and type hierarchies, while there was a simpler way to do about everything: dynamic typing gave me polymorphism for free, the data structures were built-in and had literals, string manipulation was just amazingly easy, you could have standalone functions and pass them like values, a lot of the Java design patterns were reduced to one-liners...But besides it feeling better than Java, what’s still unique for me about Python is the set of principles that are best expressed in the Zen of Python. It felt like every bit of the language followed those principles and it encouraged you to do the same with your own code;..." [46]
- Common Lisp vs Python: "...it just didn’t feel right, in a lot of ways it was the opposite of what I loved about Python: the code was very difficult to read (not because parens or prefix notation, just because it was filled with symbols), the operator set was huge, with really weird names and not dynamic at all, I remember it having like six equality operators for different types. Add to that it’s an old language with a small community and zero chances of getting a job using it..." [47]
- "Truth be told, I don’t think I miss Python all that much, at least not while I’m doing Clojure. JavaScript? is a different story, JavaScript? is a mess. But still, if you bend it in the right directions, it can be a fairly decent functional language. I noticed that Python can’t, there’s stuff that just doesn’t work that way (lambdas come to mind), and that probably would annoy me nowadays." [48]
- "I have found Python to break down as I scale it into a larger system spanning multiple directories and modules. It's fine for bashing out 3-4 file programs. It's a decently high-entropy language. It's a bloody lousy hacking language because of the prescriptiveness. I'd rather use Common Lisp, and I do, for personal stuff." [49]
- "I've never seen such a leaky set of abstractions as Python's collection library. It's ridiculous how very often I find something irritating and the reason behind it is, "This was the first implementation and therefore was crowned 'the simplest.'" That seems to be the pattern in general with Python. A great example is Python's complex iteration system. You have 2 ways to deal with it; a sophisticated but very limited list comprehension syntax and a absolutely primitive direct object iterator that barely abstracts over a coroutine." [50]
- "Python is the "second best language for everything"" [51]
- http://norvig.com/python-lisp.html
- "Peter Norvig here. I came to Python not because I thought it was a better/acceptable/pragmatic Lisp, but because it was better pseudocode. Several students claimed that they had a hard time mapping from the pseudocode in my AI textbook to the Lisp code that Russell and I had online. So I looked for the language that was most like our pseudocode, and found that Python was the best match. Then I had to teach myself enough Python to implement the examples from the textbook. I found that Python was very nice for certain types of small problems, and had the libraries I needed to integrate with lots of other stuff, at Google and elsewhere on the net. I think Lisp still has an edge for larger projects and for applications where the speed of the compiled code is important. But Python has the edge (with a large number of students) when the main goal is communication, not programming per se. In terms of programming-in-the-large, at Google and elsewhere, I think that language choice is not as important as all the other choices: if you have the right overall architecture, the right team of programmers, the right development process that allows for rapid development with continuous improvement, then many languages will work for you; if you don't have those things you're in trouble regardless of your language choice." [52]
- " I've had a similar experience, lately while writing and editing pieces for Code Quarterly--I've written the same basic algorithms in Javascript, Python, and Common Lisp to play around with them. I find the Python the best vehicle for conveying the algorithms despite being more fluent in Common Lisp." gigamonkey
- "When I took an AI class at my university with AI: A Modern Approach (the book was good, if a little difficult to understand at times), we had a couple projects that we had to write in lisp. First project: Problem Solving Agent for Traveling Salesman Problem. 1. Depth First Search (function argument- DFS) 2. Iterative Depth First Search (function argument- IDFS) 3. A* - Heuristic: Path Cost Incurred (function argument- PATH) 4. A* - Heuristic: Minimum Spanning Tree heuristic (function argument- MST) 5. (Extra Credit 25 points ) Create and implement a heuristic equal to or better than MST. Second project: In this project we implement a decision tree induction algorithm in Lisp. I had played around with lisp before this point and found it fascinating. I approached these projects with excitement. But even with 8 years of serious programming experience, I could not for the life of me solve these problems in lisp. My problems included: 1. Knowing exactly what methods I wanted to call and use and either a. Not being able to find them in any reference I found online, or b. Finding out that they don't exist, and you have to write them yourself, or c. Finding them and shaking my head at how ridiculously they were named. 2. Not being able to read the code I had just written. 3. Not being able to debug. 4. Finding that manipulating common data structures like hash tables is a total chore. Eventually I gave up. I had spent about two hours trying to implement the project I had already solved in my head into common lisp and was making little or no progress. So I fired up another vim terminal, solved the project in Python in about 30 minutes, including debugging, and then manually translated the code into lisp. When project 2 rolled around, I decided to give it another go, but I quickly became frustrated again. Maybe my mind just isn't able to grok lisp? Maybe I'm just not smart enough? All I'm claiming is that I am an example of a student who was already very knowledgeable about programming and completely unable to adapt to lisp." AndyKelley
- http://thume.ca/2019/04/29/comparing-compilers-in-rust-haskell-c-and-python/
- " I don't see how typing F# is slower, longer or less convenient than typing Python (spoiler alert, it's not). And you also get things like actual lambdas, pattern matching, currying, real parallelism and more and more. It's only that people believe there is no need to learn anything beyond Python because it's "easy", which it's not, once you go beyond several hundred lines of code. But the myth somehow continues to persist. " [53]
- vs. Scheme https://www.draketo.de/proj/py2guile/
- vs. Ruby "I really wish ruby had won over python as the "general purpose scripting language", I like it better. For example, I wish all the "big data" tooling was written in Ruby. But python is just the lingua franca. PySpark?, Pandas, Airflow, etc are all python. I'm not really going anywhere with this, just lamenting I don't get to do more Ruby in my day job. I recently switched to writing python at home just because my productivity is so much better because I've memorized the standard library better from using it for years at work." [54]
- "I worked with both Python and Ruby daily for several years, both quite a few years ago and now just write the occasional scripts in those languages (small scripts, maintenance of older projects, patches for various projects when needed, etc.) I much prefer Python for this. As an occasional Python/Ruby programmer I find it's just so much easier to work with. Ruby has a lot of features, which is part of its appeal, but that also means a lot of remember and ... a lot to forget. If you're not steeped in it, then that makes things quite a bit harder. I programmed some Ruby a few weeks ago; it had been quite a while since I last did any Ruby, and ended up having to look up the syntax for blocks. This is really basic stuff, but I had written str.gsub!('pat', {
m | ... }), and the correct syntax is of course str.sub!('pat') { | m | ... }. Kinda silly I guess and a bit of a "oh, du'h" moment; I spent two years with Ruby as my day job, but that was also five years ago, and I think these kind of mistakes aren't that unusual if you're not programming Ruby too often. And then there are caveats like using "return" where you should have used "next", and quite a few other things. Don't get me wrong, I like Ruby, and there's also a bunch of things I don't like about Python. But I can definitely understand why people settled on Python. I think Matz once said "Ruby is 80% Perl and 20% Python"; I feel that with a slightly lower Perl-to-Python-ratio Ruby would have been better." [55] |
"There may also have been implementation considerations by the way. I believe that historically at least, Ruby has been a lot slower than Python, although I think the differences today aren't very large. And a lot of these tools are written as native extensions, which Python seems to support a bit better." [56] "I think people settled on python because for a spell it was really hard to install ruby (1.6-1.8), it got worse around 2.0, (by the time python also ran into these issues, it had way more momentum) and:
- schools (high schools, colleges) were teaching python, because "it forced students into code indentation discipline"
- linux distros shipped with python because core features were being written in python" [57]
- "On a practical note, Ruby's heavy use of reflection, inheritance and method_missing style dispatch currently makes performance for big data tasks less than ideal. Python is less expressive but its "one true way" makes things like vectorization and type specialization easier for data tasks. Sometimes you really just need a better Fortran." [58]
- "The dynamic inheritance part of Ruby is still core even in a non-Rails ecosystem. That does make optimizing performance more difficult. But on a much simpler level, a lot of Pythonic data code is just functions + values, without them being complected in an object. That makes a lot of stuff like wrapping numerical libraries in C very easy in Python. I think you'd still have a bit of a culture shock writing procedural, return-by-value code in Ruby. That's pretty normal in Python and I think one of the (many) reasons it got picked up by STEM disciplines in academia. Looks over shoulder at awful code in Numerical Recipes in Fortran book. Different strokes for different folks and all that, I just think there are probably constraints (and cultural values) that Python has that make it more suited to data problems than Ruby. " [59]
- vs Ruby: https://softwaredoug.com/blog/2021/11/12/ruby-vs-python-for-loop.html
- vs Lisp "at Google ... I saw, pretty much for the first time in my life, people being as productive and more in other languages as I was in Lisp. What's more, once I got knocked off my high horse ... and actually bothered to really study some of these other languages I found *myself* suddenly becoming more productive in other languages than I was in Lisp. For example, my language of choice for doing Web development now is Python." -- Erann Gat
- vs Golang: https://yosefk.com/blog/things-from-python-id-miss-in-go.html
- vs Rust: "For me, I’m currently running a Rust webapp and a Python webapp on a server; both are of similar complexities, and I’m probably a bit better at Python. I keep on having to fix the Python app, because of both OS upgrades and problems with the implementation. The Rust app has been working without any issues for the last three years. For a relatively simple web server, where I don’t need to collaborate with anyone, I’m choosing Rust." -- [60]
- vs Golang and Python: "I've got a similar thing. I wrote a bespoke time-series database in Rust a few years ago, and it has had exactly one issue since I stood it up in production, and that was due to pessimistic filesystem access patterns, rather than the language. This thing is handling hundreds of thousands of inserts per second, and it's even threaded. Given that I've been programming professionally for over a decade in Python, Perl, Ruby, C, C++, Javascript, Java, and Rust, I'll pick Rust absolutely any time that I want something running that I won't get called at 3 AM to fix. It probably took me 5 times as long to write it as if I did it in Go or Python, but I guarantee it's saved me 10 times as much time I would have otherwise spent triaging, debugging, and running disaster recovery." -- [61]
- vs PHP, Ruby, Node, Java as of 2012: "What did Python have relative to other languages that inspired these developers to devote so much energy into their projects? They could have chosen PHP, Ruby, Node, Go, Java, etc. If I think back to ~2012 I remember that compared to other languages Python had: 1) Decent package management and "virtualization" (via virtualenv) 2) Nice balance of expressiveness and strictness 3) Was fun to write, hard to explain but it was the vibe 4) Decent enough performance for most things 5) Lots of developer tooling for debugging and stuff It felt like the least messy of the scripting languages.
- Everyone knows all the issues PHP had, no need to list it all here
- Ruby was cool but there were a thousand ways to do the same thing
- Node was inheriting that weird web language with all its quirks
- Go was just in its infancy
- Java was the kitchen sink " -- [62]
- " Ruby had all of your points 1-5. I don't think the difference was 'a thousand ways to do the same thing' (python has many ways to do the same thing also, and ruby's approach made it 'more fun') but some of the core libraries - especially things like numpy, pandas, scipy, etc. that both increased performance, simplified a wide variety of tasks and also attracted a broader range of users (mostly the 'scientific community'). " -- [63]
- " Ruby has comparatively crappy documentation (still does, arguably worse now because the Pickaxe is no longer updated), and also had a very weak non-Rails English-language community, but comparison to Python (by rumor, it had a much stronger Japanese language community, but I can't attest to that.) If you were doing something nontrivial where the interesting part weren't the kind of things Rails/ActiveSupport? addressed, those factors alone (before even considering language/library features) made it a lot more of an uphill climb with Ruby. " -- [64]
- vs Haskell: https://mazzo.li/posts/haskell-readability.html
- vs Golang: "To be honest, I’d still probably reach for Python first for throwaway scripts, because of its terser syntax, list (and other) comprehensions, and exception handling by default. However, for anything more than a throwaway script, I’d quickly move to Go. Its standard library is better-designed, its io.Reader and io.Writer interfaces are excellent, and its lightweight static typing helps catch bugs without getting in the way." -- https://benhoyt.com/writings/gogit/
- vs Rust and F#: "In any language with native async infrastructure built-in, I’ve had to learn how it works pretty intimately to effectively use it. My worst experiences have been with Python’s asyncio while the easiest was probably F#." -- [65]
Wishes
Features
'Properties' (getters and setters)
e.g. (thx http://www.ianbicking.org/property-decorator.html )
class Velocity(object): def __init__(self, x, y): self.x = x self.y = y @property def speed(self): return math.sqrt(self.x2 + self.y2) @speed.setter def speed(self, value): angle = math.atan2(self.x, self.y) self.x = math.sin(angle) * value self.y = math.cos(angle) * value @speed.deleter def speed(self): self.x = self.y = 0
Descriptor protocol
.__getattr__, .__dict__, etc (see https://docs.python.org/3.5/howto/descriptor.html )
see https://iluxonchik.github.io/why-you-should-learn-python/ (about halfway down) for some short usage examples
Composition via getattr
eg (thx http://lgiordani.com/blog/2014/08/20/python-3-oop-part-3-delegation-composition-and-inheritance/ ):
class ComposedDoor?: def __init__(self, number, status): self.door = Door(number, status)
def __getattr__(self, attr):
return getattr(self.door, attr)
Super
https://docs.python.org/2/library/functions.html#super
http://rhettinger.wordpress.com/2011/05/26/super-considered-super/
Multiprocessing
http://www.jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/
Imports
https://tenthousandmeters.com/blog/python-behind-the-scenes-11-how-the-python-import-system-works/
discussion: https://news.ycombinator.com/item?id=27941208
Practices
namedtuple idiom for concise constructors:
from collections import namedtuple
e.g. class Position(namedtuple('Position', 'board score wc bc ep kp')):"
Interop
Opinions
- "I also view golang as the successor to Python. It's faster, modern, addresses concurrency, and makes things easier for developers....Python, unfortunately, is dead to me. The GIL is a deal-breaker..." -- https://news.ycombinator.com/item?id=9274328
- on the walrus operator:
- vs R:
- "The biggest hurdle to python right now is the stupid package managers. We need cargo for Python." systemvoltage
- "I think in general Python's biggest challenge is that it doesn't scale well. This is an agglomeration of issues around the same theme: bad packaging when there's a lot of cross-cutting dependencies, slow performance, no concurrency, typing as second-class citizens, etc. All of that is barely noticeable when you're just getting started on a small experimental project, but incredibly painful in large production systems. I strongly suspect that devs' satisfaction with Python is strongly correlated with the size of the codebase they're working on. Generally people using Python for one-off projects or self-contained tools tend to be pretty happy. People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language. What I've observed a lot is that many startups or greenfield projects start with Python to get an MVP out the door as fast as possible. Then as the scope of the software expands they feel increasingly bogged down and trapped in the language. " dcolkitt
- " And how much of those problems are an artefact of moving fast and getting things down. I've seen the exact same scenario with other languages. The problem is that in a start up environment you are likely adding amd retiring more "features" at a speed that layers so much complexity that you can no longer reason about what business rules are actually valid any more. " Guther
- "I think that's part of it. There is a convention over configuration issue as well. A language like Go forces some patterns like package management and formatting unless you actively try to subvert it. It wouldn't surprise me if many of these issues are self-selecting in the language communities as well." EsotericAlgo
- "We use poetry for apps in production. At this point I think that's the winning solution and as it continues to grow and improve I think it will overtake all the others in this respect." kitanata
- "Yes, Poetry should be the blessed package manager." DangitBobby
- "Yes, Poetry is great! I avoided Python for a long time due to it's bad package management/environment handling situation, but Poetry solved all my problems there." hobofan
- " So I hear Poetry is the way to go these days for python. But a plurality of the people I encounter in the Clojure community came there because leiningen (Clojure's package manager that uses Maven under the covers) "just works" and they got tired of having a tough time reproducing builds consistently on other platforms / OSs with Python; not to mention the performance gains of the JVM." nickbauman
- "poetry feels like the closest equivalent to cargo that I've used. pipenv is better than the previous status quo but is still oddly unstable, with random new issues I encounter with every release. poetry "just works" for me, has better dependency resolution, and IMO has a nicer interface and terminal output to boot." meowface
- "I'd love to see Poetry take off. I'm watching it pretty closely." staticassertion
- "Same, we switched from pipenv ~6 months ago, had not had to worry about package/env related stuff since then, "just works". "evangelized" a sibling team also to consider switching, they were sceptical but just recently they mentioned they also like it more." kamyarg
- "Same here. Poetry has been a joy, after many bouts of frustration with pipenv." transcranial
- "Just a heads up to anyone who hasn’t looked recently: pipenv has been very actively worked on since earlier this year and has had four updates that fix a lot of issues. Earlier this year I would have said Poetry is better hands down, but after the updates and after using poetry and seeing some of its quirks, it’s a much closer matchup." [66]
- "pip in virtualenv. It is solved problem." robmurrer
- "90% of professional python developers believe that python dependencies is a solved problem (with pip and virtualenv)." [67]
- vs. Ruby "I've found Ruby to be far more "learnable", in that once I learn how something works in Ruby, it will work that way in most every situation going forward, whereas Python has a lot of caveats to its functionality. This might just be due to my use cases, but the Ruby documentation is 100x more useful for getting me from point A to point B than anything I've seen from Python's docs. I still use Python daily, since its so ubiquitous, I just find myself having a difficult time using it for anything remotely complicated without a lot of extra research. " [68]
- vs Lisp: "Lisp is a much better choice if your primary goal is defining a full-blown domain specific language. For most other programming tasks, the two languages offer similar functionality in different way." -- Peter Norvig
- vs Lisp: "I think Lisp still has an edge for larger projects and for applications where the speed of the compiled code is important. But Python has the edge (with a large number of students) when the main goal is communication, not programming per se." Peter Norvig
- "I find exception handling in Python to be one of its weakest spots. I really like Python and I use it often. But I'm never quite sure which exceptions a library function might throw. This is rarely documented well and so you end up encountering new exceptions at run time, which is exactly when you do not want to be encountering new thrown exceptions. Usually I end up looking at source of libraries and making a list of all exceptions that can be thrown. But of course this is also terribly error prone." -- [69]
- vs Rust: "Dev velocity, which was supposed to be the claim to fame of Python, improved dramatically with Rust." -- [70]
- vs Golang and Rust: "I've been a professional Python developer for 15 years, and I can't believe Python ever had the reputation for "high dev velocity" beyond toy examples. In every real world code base I've worked in, Python has been a strict liability and the promise that you can "just rewrite the slow parts in C/C++/Numpy/etc" has never held (you very often spend at least as much time marshaling the data to the target language format than you gain by processing in the faster language and you have all of the maintainability problems of C/C++/Numpy). Python trades developer velocity for poor runtime performance. I don't think Rust is the paragon of high dev velocity languages either, but it seems to be more predictable. You don't have something that works as a prototype, but then you run into a bunch of performance problems (which virtually cannot be worked around) when you go to productionize, nor do you get all of the emergent quality and maintainability issues that come about from a dynamic type system (and last I checked, Python's optional static type system was still pretty alpha-grade). I strongly recommend avoiding Python for new projects. Use Go if you're looking for an easy GC language with max productivity and a decent performance ceiling. Use Rust if you're writing really high performance or correctness-is-paramount software. I'm sure there are other decent options as well (I've heard generally good things about TypeScript?, but I'd be concerned about its performance even if it is better than Python). " -- [71]
- "Story -- I was a very early adopter of Python, back in the mid-90s. When other people wrote their CGI scripts in Perl, I always reached for Python. The first paid gig I ever had was a CGI script ("resume builder") I wrote in Python in 1996. But almost nobody was using it back then, and I'd get quizzical stares from people in job interviews etc. when it came up. So for many years back then I really really wanted to get a job working in Python. And I was super excited to get a job around the 2001 time frame in it on a pretty cool embedded Linux project. But my first experience working in a large codebase was disappointing. I could see right away that once many developers started working in that codebase together, best practices were really hard to enforce, and things started to fall apart into a bit of a mess, and lots of problems just showed up at runtime in bad explosions. That was the beginning of my disenchantment with dynamically typed, late-bound languages. They only give you the illusion of fast prototyping. You just shift your time to fixing type and binding errors later in your dev cycle. Python really improved a lot with the 2->3 transition, and the community has much better best practices now. But I think the core problem with late bound languages remains. All that said, I'm currently in another window working for my current employer wiring an embedded CPython interpreter into a Rust-based runtime... " -- [72]
- "As I understand it, the article talks about switching from python to rust in a non-trivial database service that is required to be fast and robust. I can't imagine python to do well in this regard, especially when C/C++ extensions are required to enhance the performance. I also agree that 'Python has been a strict liability and the promise that you can "just rewrite the slow parts in C/C++/Numpy/etc" has never held'. I just don't agree those points necessarily imply "avoiding Python for new projects". Even in the case of the original article, was it, in hindsight, a bad decision to start with python? I'd argue not necessarily. Python is still great for quickly hacking together something that works, not necessarily fast or maintainable, but if you're under time pressure or you're still trying to feel out the market fit for your product, it's not a great idea to waste brain cycles to deal with borrow checks for example. Python's weaknesses only show when you need to scale, both on the performance and on the lines of code / number of collaborators metric, which is a problem you have only when the project succeeds. Avoiding python because of these problems reeks of premature optimization, unless you know it's how the project will end up in advance. As other sibling comments point out, python is still great for small scripts and weekend projects. Perhaps you do mean 'avoiding Python for new projects that are expected to grow non-trivially', which is fine advice. I'm just a bit tired of all those over-reaching blanket statements people make regarding software design that only really makes sense in a limited context (here's a funny one I've heard recently: never write nested-for-loops because it's sloooww). It would be unfortunate to have the message "don't use python for large projects" be misinterpreted as "python sucks no matter what". " -- [73]
Internals and implementations
Links:
Core data structures: todo
Grammar
https://docs.python.org/3/reference/grammar.html
Python's grammar is LL(1).
Note that Python's grammar cannot be represented as an operator precedence grammar [74].
Number representations
Integers
Floating points todo
integers below 256 are implemented as singletons [75].
array representation
variable-length lists: todo
multidimensional arrays: todo
limits on sizes of the above
string representation
Python strings are immutable [76]
Python's string representation "support multiple internal representations, depending on the character with the largest Unicode ordinal (1, 2, or 4 bytes). This will allow a space-efficient representation in common cases, but give access to full UCS-4 on all systems.": [77] [78]
" Unicode structures are now defined as a hierarchy of structures, namely:
typedef struct { PyObject?_HEAD Py_ssize_t length; Py_hash_t hash; struct { unsigned int interned:2; unsigned int kind:2; unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; } state; wchar_t *wstr; } PyASCIIObject?;
typedef struct { PyASCIIObject? _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyCompactUnicodeObject?;
typedef struct { PyCompactUnicodeObject? _base; union { void *any; Py_UCS1 *latin1; Py_UCS2 *ucs2; Py_UCS4 *ucs4; } data; } PyUnicodeObject?;
" -- [79]
how strings used to be stored in older versions of Python: http://www.laurentluce.com/posts/python-string-objects-implementation/
Python string max length is given by Py_ssize_t [80] (which, interestingly, is signed, because many places in Python's implementation internally use negative lengths to denote counting from the end [81])
Representation of structures with fields
todo
some examples in https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/
Hashing and equality checks
Memory management
"There’s no chance that the reference count can overflow; at least as many bits are used to hold the reference count as there are distinct memory locations in virtual memory (assuming sizeof(Py_ssize_t) >= sizeof(void*)). Thus, the reference count increment is a simple operation." -- https://docs.python.org/2/c-api/intro.html
"Every Python object contains at least a refcount and a reference to the object's type in addition to other storage; on a 64-bit machine, that takes up 16 bytes" -- http://stackoverflow.com/questions/10365624/sys-getsizeofint-returns-an-unreasonably-large-value/10365639#10365639
Examples of some Python objects and how much memory they take (in 64-bit Python 2.7x; from [82]):
- None: 16 bytes
- integer: 24 bytes
- that's probably 8 bytes for the int, 8 bytes for a pointer to the type, and 8 bytes for the refcount [83]
- empty string: 37 bytes
- empty list: 72 bytes
- empty dict: 280 bytes
Links:
Concurrency and the GIL
pypy-stm appears to be one of the latest attempts to remove the GIL from Python (Stackless didn't remove the GIL, it was about microtasks; PyPy? incorporated much of Stackless but also didn't remove the GIL).
- https://docs.google.com/document/u/0/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/mobilebasic
- A viable solution for Python concurrency describes Sam Gross's proposal to get rid of the GIL via "biased reference counts" where "the reference count in each object is split in two, with one "local" count for the owner (creator) of the object and a shared count for all other threads. Since the owner has exclusive access to its count, increments and decrements can be done with fast, non-atomic instructions. Any other thread accessing the object will use atomic operations on the shared reference count.".
- https://lukasz.langa.pl/5d044f91-49c1-4170-aed1-62b6763e6ad0/
- https://news.ycombinator.com/item?id=29005573
- "Introduction to nogil Sam published his code alongside a detailed write-up where he explains the motivation and design of his fork. The design can be summarized as: replacement of Python’s built-in allocator pymalloc with mimalloc for thread safety, including cooperation required for lockless read access to dictionaries and other collections, and efficiency (heap memory layout allows finding GC-tracked objects without having to maintain an explicit list); replacement of Python’s non-atomic eager reference counting with biased reference counting that: ties each object with the thread that created it (called the owner thread); enables efficient non-atomic local reference counting within the owner thread; allows for slower but atomic shared reference counting in other threads; to speed up object access across threads (otherwise slowed down by the atomic shared reference counting), two techniques are employed: some special objects are immortalized, meaning their reference counts are never computed and they are never deallocated: this applies to singletons like None, True, False, other small integers and interned strings, as well as statically allocated PyTypeObjects? for built-in types; deferred reference counting is used for other globally accessible objects like top-level functions, code objects, and modules; those don’t use immortalization as they aren’t always kept for the lifetime of the program; adjustment of the cyclic garbage collector to become a single-threaded stop-the-world garbage collector that: waits for all threads to pause at a safe point (any bytecode boundary); doesn’t wait for threads that are blocked on I/O (and use PyEval?_ReleaseThread?, an equivalent of releasing the GIL in current Python); efficiently constructs the list of objects to deallocate just-in-time: thanks to using mimalloc, GC-tracked objects are all kept in a separate lightweight heap; relocation of the process-global MRO cache to thread-local to avoid contention on MRO lookups; invalidations are still global; modification of built-in collections to become thread-safe. Sam’s design document contains detail into how those design elements operate, as well as information on thread states and the GIL API, other interpreter and bytecode modifications (replacement of the stack VM with a register VM with accumulator register; optimized function calls by avoiding creation of C stack frames; other changes to ceval.c; usage of tagged pointers; thread-safe metadata for LOAD_ATTR, LOAD_METHOD, LOAD_GLOBAL opcodes; and more). I encourage you to read it in its entirety. " -- [84]
Compiled format
Variable lookup details
"Python...maps variable names to integer indices during compilation to bytecode, and the bytecode just takes those embedded constant indices and indexes into an array to obtain a local variable's value. That's a lot faster." -- Gergo Barany
and metaprogramming interaction:
Q: "how does CPython resolve scope if it maps variable names to indices? In the case of `exec(input())` and say the input string is `x = 1`, how would it compile bytecode to allocate space for x and index into the value?" [85]
A: "In Python 2, the bytecode optimization that lets you access variables by index is turned off if your function has exec code in it that may modify locals.
So if your function is:
def foo(): exec "a=1"; return a
Then running dis.dis on foo to disassemble the bytecode it you will see:
8 LOAD_NAME 0 (a)
while you normally would see:
6 LOAD_FAST 0 (a)
... " -- Erwin
if you try this, note the scoping details:
" speedster217 7 days ago [-]
I just tested it in Python3:
def foo():
exec("a=1")
return a
print(foo())
Fails with a NameError?:
Traceback (most recent call last):
File "test.py", line 5, in <module>
print(foo())
File "test.py", line 3, in foo
return a
NameError: name 'a' is not defined
reply
joncatanio 7 days ago [-]
I get the same error with your example, but this works fine (Python 3.6.4):
exec("a = 1")
print(a)
This will print "1".
reply
yorwba 7 days ago [-]
That is because the exec runs in the global scope. When Python sees a variable that is not assigned to in the local scope, it is assumed to be a global variable, so when exec creates a new local variable, the load still fails because it looks into the globals dictionary.
But you can do this:
def foo():
exec("a = 1")
return locals()['a']
reply
"
Pythons Internals retrospectives and studies
" In this study we attempt to quantify the costs of language features such as dynamic typing, reference counting for memory management, boxing of numbers, and late binding of function calls
...
We find that a boxed representation of numbers as heap objects is the single most costly language feature on numeric codes, accounting for up to 43 % of total execution time in our benchmark set. On symbolic object-oriented code, late binding of function and method calls costs up to 30 %. Redundant reference counting, dynamic type checks, and Python’s elaborate function calling convention have comparatively smaller costs.
...
The optimizations performed by pylibjit on request aim directly at eliminating the sources of overheads discussed above.
...
Unboxing of numbers uses type annotations to directly use machine operations on integers and floating-point numbers that fit into machine registers. This avoids storing every number in a heap object and managing that memory. As a special case, this also turns loops of the very common form for i in range(n) , which would normally use an iterator to enumerate boxed numbers from 0 to n , into unboxed counting loops if i is an unboxed machine integer.
...
Early binding of function calls resolves the addresses of compiled or built-in functions or methods at compile time (guided by type annotations for methods’ receiver objects) and generates direct CPU call instructions. " -- Python Interpreter Performance Deconstructed by Gergö Barany
Some issues with the CPython C API
Pythons Internals links
Extensions
Implementations and variants
Implementation: Cython
"I use it for some things and straight C++ for others. Cython is good for simple things, like small tight loops. One thing that is much more reliable in pure C++ is any kind of threading or OpenMP? etc. Theoretically Cython has it, in practice it can cause very weird problems. Also if you want to use C (or C++) libs in Cython, you have to manually declare all the function prototypes before using them. Also Cython has a tendency to make minor breaking changes to syntax every version, breaking my code and/or making it version-dependent. Since I distribute to others, the best method for linking performance intensive-code to Python I have found is: 1. Write the core in C++ 2. extern "C" a simple C API for it 3. Compile both to a SO 4. Access the SO from ctypes. It is more robust than Cython IME. Another advantage of this approach is that you have now have a generic C library you can use anywhere else if you want, not just Python. And you don't have to link -lpython so the C lib can be used in places where Python isn't installed. Finally, it can be nice to have some C++ binaries for startup time, and AFAIK you can't compile binaries with setuptools." [86]
Lists of Implementations and variants
Taichi variant / DSL / extension / language
https://www.taichi-lang.org/
https://docs.taichi-lang.org/blog/accelerate-python-code-100x
concurrency / GPU-focused numerics extension
PyPy implementation
https://pypy.org/
Discussion on incremental garbage collection (GC) in PyPy?:
Discussion on PyPy?-CPython extension interop:
PyPyJS
http://pypyjs.org/
Stackless implementation
Stackless (see also [87], [88], [89] [90])
TinyPy variant
tinypy (about 80k of code)
PyMite (embedded) variant
PyMite
MicroPython (embedded) variant
http://micropython.org/
Brython variant
https://brython.info/
"A Python 3 implementation for client-side web programming...Brython is designed to replace Javascript as the scripting language for the Web"
"It renders quite well a lot of Python features. Even things like yield from, sets, generator.send(), dict.fromkeys, multiple inheritance and 0b1100101. But there are several problems with brython: - it can't render some obscure Python features. It's very hard to do a complete Python implementation. E.g: metaclass will fail." [91]
some discussion here:
RustPython variant
https://github.com/RustPython/RustPython
Chocopy variant
https://chocopy.org/
MyPy variant
https://github.com/python/mypy/tree/master/mypyc
- "Mypyc compiles what is essentially a Python language variant using "strict" semantics. This means (among some other things):
- Most type annotations are enforced at runtime (raising TypeError? on mismatch)
- Classes are compiled into extension classes without __dict__ (much, but not quite, like if they used __slots__)
- Monkey patching doesn't work
- Instance attributes won't fall back to class attributes if undefined
- Metaclasses not supported"
- "There are several python type checkers, and while they do aim for consistency with the relevant PEPs, they do not behave identically. See, for instance, this paper detailing pytype and mypy’s differing views of python’s type system." -- https://google.github.io/pytype/typing_faq.html
Skylark variant
https://docs.bazel.build/versions/master/skylark/language.html (was: https://docs.bazel.build/versions/master/skylark/language.html )
https://github.com/bazelbuild/starlark
Snek variant
Snek https://sneklang.org/doc/snek.html
"Snek is a tiny embeddable language targeting processors with only a few kB of flash and ram. Think of something that would have been running BASIC years ago and you'll have the idea. These processors are too small to run MicroPython?. Snek borrows semantics and syntax from python, but only provides a tiny subset of that large language. The goal is to have Snek programs able to run in a full Python (version 3) implementation so that any knowledge gained in learning Snek will transfer directly to learning Python."
MyCpp variant
https://github.com/oilshell/oil/blob/master/mycpp/README.md "A tool that translates a subset of statically-typed Python to C++." "an experimental Python-to-C++ translator based on MyPy?. It only handles the small subset of Python that Oil uses."
Mycpp
Shedskin variant
http://shedskin.github.io/
- https://en.wikipedia.org/wiki/Shed_Skin "Besides the typing restriction,[2] programs cannot freely use the Python standard library, although about 20 common modules, such as random, itertools and re (regular expressions), are supported as of 2011. Also, not all Python features, such as nested functions and variable numbers of arguments, are supported. Many introspective dynamic parts of the language are unsupported. For example, functions like getattr, and hasattr are unsupported. As of May 2011, Unicode is not supported.[3] "
- https://shedskin.readthedocs.io/en/latest/documentation.html#introduction
- https://shedskin.readthedocs.io/en/latest/documentation.html#python-subset-restrictions
- "The following common features are currently not supported:
eval, getattr, hasattr, isinstance, anything really dynamic
arbitrary-size arithmetic (integers become 32-bit (signed) by default on most architectures, see Command-line options)
argument (un)packing (*args and **kwargs)
multiple inheritance
nested functions and classes
unicode
inheritance from builtins (excluding Exception and object)
overloading __iter__, __call__, __del__
closures
Some other features are currently only partially supported:
class attributes must always be accessed using a class identifier:
self.class_attr # bad SomeClass?.class_attr # good SomeClass?.some_static_method() # good
function references can be passed around, but not method references or class references, and they cannot be contained:
..."
Please BUILD variant
Please BUILD
- subinclude for imports
- types: Integers (all integers are 64-bit signed integers), Strings, Lists, Dictionaries, Functions, Booleans (no floats or classes)
- builtin functions: https://please.build/lexicon.html#python-builtins
- len(x) enumerate(seq) zip(x, y, ...) isinstance(x, type) range([start, ]stop[, step]) any(seq) all(seq) sorted(seq) package_name() subrepo_name() join_path(x, ...) split_path(path) splitext(filename) basename(path) dirname(path) breakpoint()
- string builtins: join(seq) split(sep) replace(old, new) partition(sep) rpartition(sep) startswith(prefix) endswith(suffix) format(arg1=val1, arg2=val2, ...) lstrip(cutset) rstrip(cutset) strip(cutset) removeprefix(prefix) removesuffix(suffix) find(needle) rfind(needle) count(needle) upper() lower()
- dict builtins: get(key[, default]) setdefault(key[, default]) keys() values() items() copy()
- continue for if return raise assert def = += . [] lambda {}
- + - * / % < > and or is "is not" in not in == != >= <=
- compared to Python, lacks: "the import, try, except, finally, class, global, nonlocal, while and async keywords"
- "User-defined varargs and kwargs functions are not supported."
- "PEP-498 style "f-string" interpolation is available, but it is deliberately much more limited than in Python; it can only interpolate variable names rather than arbitrary expressions."
- "Dictionaries are somewhat restricted in function; they may only be keyed by strings and cannot be iterated directly - i.e. one must use keys(), values() or items(). The results of all these functions are always consistently ordered. They support PEP-584 style unions (although not the
Pycopy variant
https://github.com/pfalcon/pycopy
https://github.com/pycopy/PycoEPs/blob/master/StrictMode.md
py2many transpiler variant
https://github.com/adsharma/py2many/blob/main/doc/langspec.md
Kuroko variant
https://github.com/kuroko-lang/kuroko
"Dialect of Python with explicit variable declaration and block scoping, with a lightweight and easy-to-embed bytecode compiler and interpreter." [92]
"Importantly, it avoids a few gotchas in Python such as default parameters, and scoping rules, but stays generally compatible with Python." [93]
Oil's old OVM
Oil used to reuse a subset of the Python virtual machine implementation. These pages indicate what is in the subset:
context: https://lobste.rs/s/p8kn8u/how_python_was_shaped_by_leaky_internals#c_nroxen
Violet (Python in Swift implementation)
https://forums.swift.org/t/violet-python-vm-written-in-swift/56945
discussion:
CPython compiled to webassembly (wasm) implementations
some discussion here:
Cannoli
https://github.com/joncatanio/cannoli
"Cannoli is a compiler for a subset of Python 3.6.5 and is designed to evaluate the language features of Python that negatively impact performance. ... Cannoli supports a subset of Python 3.6.5, its current state omits many features that could not be completed during the duration of the thesis. The main omissions are exceptions and inheritance. ... Cannoli supports two major optimizations that come as a result of applying restrictions to the language. Restrictions are placed on the Python features that provide the ability to delete or inject scope elements and the ability to mutate the structure of objects and classes at run time. "
https://digitalcommons.calpoly.edu/theses/1886/
https://news.ycombinator.com/item?id=17093051
Minimal Viable Python variant
https://snarky.ca/mvpy-minimum-viable-python/
A core language of 15 constructs that the rest of Python can be lowered to (according to this author).
- Integers (as the base for other literals like bytes)
- Floats (because I didn't want to mess with getting the accuracy wrong)
- Function calls
- =
- :=
- Function definitions
- global
- nonlocal
- return
- yield
- lambda
- del
- try/except
- if
- while
S6 optimized implementation
https://github.com/deepmind/s6
Mojo variant
https://www.modular.com/mojo
Misc links