Table of Contents for Programming Languages: a survey
Languages for scientific, numerical, or mathematical computation
MATLAB and Octave
Tutorials:
Internals and implementations
Core data structures: todo
Number representations
Integers
Floating points todo
array representation
variable-length lists: todo
multidimensional arrays: todo
limits on sizes of the above
string representation
Representation of structures with fields
todo
R
features:
- "r functions are very similar to fexprs" [1]
- "R has some really crazy metaprogramming facilities. This might sound strange coming from Python, which is already very dynamic - but R adds arbitrary infix operators, code-as-data, and environments (as in, collections of bindings, as used by variables and closures) as first class objects. On top of that, in R, argument passing in function calls is call-by-name-and-lazy-value - meaning that for every argument, the function can either just treat it as a simple value (same semantics as normal pass-by-value, except evaluation is deferred until the first use), or it can obtain the entire expression used at the point of the call, and try to creatively interpret it. This all makes it possible to do really impressive things with syntax that are implemented as pure libraries, with no changes to the main language." -- [2]
- "In R, everything is an expression, and every expression is a function call. Even things like assignments, if/else, or function definitions themselves, are function calls, with C-like syntactic sugar on top. You don't have to use that sugar, though! And all those function calls are represented as "pairlists", which is to say, linked lists. Exactly like an S-expr would - first element is the name being invoked, and the rest are arguments. And you can do all the same things with them - construct them at runtime, or modify existing ones, macro-style. So in that sense, R is actually pretty much just Lisp with lazy argument evaluation (which makes special forms unnecessary, since they can all be done as functions), and syntax sugar on top. Where it really deviates is the data/object model, with arrays and auto-vectorization everywhere... Something like that would be called a FEXPR in Lisp. " [3] [4]
free tutorials:
books:
best practices and style guides:
libraries:
Retrospectives:
Opinions:
- "I think R is a great language for certain applications - namely statistics and some data analysis. " -- [5]
- "I strongly believe that R has excellent features to support data science: non-standard evaluation, NAs at a very low level, data frames, first class functions, ..." -- [6]
- the type and object system
- "I don't think R's OO is great; but it's mostly misunderstood because it's fundamentally different to most popular modern languages. Interesting Julia, uses exactly the same model as R (I.e. Generic functions not message passing)" [7]
- "My main criticism is around two things. One, "clunkiness" of class creation (which I guess is subjective); I don't think it's very readable, but maybe that's bias from other languages. Second the S3/S4/R4 mess." -- https://news.ycombinator.com/item?id=9784471
- "I'd say R is a _terrible_ language. Its types are just really different from every major programming language, and it's horrible for an experienced programmer to use." [8]
- "I have to disagree. Its main model is generic function method dispatching. It can feel odd at first to someone coming from the C++ style of OO where objects own methods, not methods owning objects. But it's a legitimate OO style with its own advantages. [9]. I've found the more I use R, the more intuitive a lot of its operations are. It's relatively easy to "guess" what you ought to do to accomplish what you want. More so then other languages I've learned." https://news.ycombinator.com/item?id=10388416
- "It's type system is seems very complicated and while the language tries to do what it thinks you want, it's not always clear what is going on (are you working on a matrix or a dataframe that has been cast into a matrix?). ...R is one of those languages that is good in a certain domain, but once you get out of that domain, it makes things more complicated than they need to be. Normally for a language design, you aim to make easy things easy, and difficult things possible. For R, it seems like it makes difficult things easy and easy things difficult." [10]
- "it has 4 (four) objects systems which differ in subtle ways between each other. It's programmers' nightmare. It's not the worst language in the world, but it isn't terrific language either." [11]
- "I would say that any language that does not have a facility to get the path of the current file, is not 'excellent' under the criteria an experienced programmer would use for assessing it." [12]
- "I'm not qualified to comment on how good or bad a language R is. But it is maddening how package developers don't follow some convention for naming functions. I load a package that I haven't used recently and I know the function I want but can't remember if it is called my_function, myFunction, my.function, or MyFunction?." [13]
- "R does have some pretty cool tools for after the fact debugging, like dump.frames, but few people know about them." -- [14]
- "I still feel like the many great features of R (as you said some time ago, NA handling, data frames, etc.) are sometimes outweighed by the cons." [15]
- "The language was designed for data analysis, and has some quirks (like the way data structures are indexed and have to be stored in physical memory)..." -- [16]
- "This is my biggest beef with R. It is constantly changing the dimensions and types of your data without telling you. Want to grab some subset of the rows of a matrix? Better add some extra post-processing in case there's only one row that satisfies your query, or else R will change its type!... (reply: You know that you can tell it not to do that, right? drop=FALSE ; reply to the reply: I think this only reinforces my point: this is a ridiculous default. another reply: use dplyr's tbl_df wrapper around data.frame)" -- https://news.ycombinator.com/item?id=9267362
- "...R language is such a mess. For example, R has lazy evaluation despite being an imperative stateful language." [17]
- "R's syntax is not the problem, the "standard" library is the biggest problem in it's inconsistencies. That said, the other big area of complaint in R is the type system. We are too often having to coerce types, but I'm not exactly sure of the solution for that." [18]
- "I find R's syntax to be a big hangup for new learners, especially on indexing and apply-to-each (sapply, mapply, just plain apply...), but dplyr really makes life much easier. The %>% operator alone (which to be fair was originally from magrittr) is a great help. Not sure if this is my personal biases, but I always find it easier to read calls chained postfix-style." -- [19]
- http://arrgh.tim-smith.us/index.html
- "importing a package takes a meaningful amount of time in R. Several seconds, that is just unacceptable." -- [20]
- "I have to fight with R on scientific notation, always copy - pasting into my code: options(scipen=999)" -- [21]
- "its a personal matter, but R has syntaxes that get on my nerves. python list: a = [1,2,3] a = c(1,2,3). perhaps its because i used other languages before, but my fingers are more adept at hitting [ which requires no shift compared to (. some people love curly braces and lots of parentheses in if/for statements, I appreciate them not being there." -- [22]
- "Its main model is generic function method dispatching. It can feel odd at first to someone coming from the C++ style of OO where objects own methods, not methods owning objects. But it's a legitimate OO style with its own advantages. [23]. I've found the more I use R, the more intuitive a lot of its operations are. It's relatively easy to "guess" what you ought to do to accomplish what you want. More so then other languages I've learned." [24]
- "Dependency management, in my opinion, is one of the problems in the R ecosystem. The lack of name spaces when calling functions has made the community have many little packages that only do one thing on you are not really sure where it was actually used, unless you know the code and the package. An example is the janitor::clean_names function I like to use for standardizing the column names on a data.frame." [25]
- "R is a disaster when you want to write programs as you would in a real programming language. R is an excellent choice for what it is used most of the time by these people whose education/training isn't related to programming: interactive analysis of data and (maybe) writing prototypes." [26]
- "The thing to keep in mind is that, from the point of view of someone who works with data, R isn't a programming language. It's a statistical software package that has a programming language. Its competitors are things like Minitab, SPSS, Stata, and JMP, all of which used to be entirely menu-driven. R was a genuine innovation when it was first introduced." [27]
- Evaluating the Design of the R Language
- " R is terrible, and especially so for non-professional programmers, and it is an absolute disaster for the applications where it routinely gets used, namely statistics for scientific applications. The reason is its strong tendency to fail silently (and, with RStudio, to frequently keep going even when it does fail.)" [28]
- "...if I have to analyze a .csv quickly, I'm going for R most of the time." [29]
- "...nothing beats R for getting to an answer as fast as possible (not even Python) at the cost of making it more difficult to productionise a solution in pure R." [30]
- "Most people I know who use R don't care a whit about production. They run an analysis to answer hypotheses." [31]
- " I agree that R shouldn't be used in production, but R is great for prototyping different analytical models before porting them over to Python or another language. " [32]
- "Another feature for this audience is the philosophy that functions shouldn't have side effects. You can still do (several types) of object oriented programming in R, but it does take away some of the ways in which non-programmers shoot themselves in the foot. I've come to really like the way environments work in R, as well." [33]
- "A sane language shouldn’t need three different object systems." [34]
- "Meh, S3 is nice and lightweight for a very particular kind of analysis interoperability. S4 isn't super useful IMHO. RC is very well thought out, and I've heard good things about R6. It might not be sane language design, but it works well for designing very different kinds of analytical procedures." [35]
- "As a long-time R user, I agree with all of these complaints. The language itself is ugly and actively tries to get in your way. I'll add that concepts like data frames are not really intrinsic, and you get needless complexities like "length", "nrow", "dim", each of which does the wrong thing in 90% of the scenarios of interest. The confusion of lvalues is another strange quirk -- a <- 0; length(a) <- 20 is totally valid, and you get things like class(a) <- 'foo' being preferred over the equivalent a$class <- foo. It has all sorts of odd concepts between lists and data.frames -- the double-bracket syntax, etc. The object model is very confusing, though most people seem to have converged on the S3 system, which is the oldest one. If you discipline yourself to learning "the good parts", especially by learning either data.tables or tidyverse or becoming a master of split/lapply/aggregate/ave, then it is very powerful. The modelling tools and plotting (both base graphics and ggplot2) are excellent. I'd love to see a NeoR? arise at some point that fixes the strange historical inconsistencies (like what happens when you refer to vec[0], as noted by the author) in non-backward compatible ways." [36]
- https://blog.shotwell.ca/posts/why_i_use_r/
- https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/
- https://www.dataquest.io/blog/python-vs-r/
- "It's python that's really a mess for data science; you can't avoid it being a programming language first and a tool for data science a distant tenth. Syntax that only a programmer would like is necessary, and quite a bit of it at that. R is a much better fit for people who want to do statistics first, and as little programming as possible. Thinks like function parameters being promises make it far easier to deal with functions like optimizers where there really are 10+ tuning parameters or things you may want to tweak. Iterative languages are far easier to understand for people who don't want to be programmers. You cannot develop plyr or ggplot in a language agnostic way, because they need the purpose built syntax R has. Contrast to eg the fight in python to get an infix matrix multiplication operator." [37]
- "...I used R for statistics at college, but only base R, and it is verbose even for basic data manipulation. The scripts I made for my older data blog posts are effectively incomprehensible to me now. I ended up learning how to use Python/pandas/IPython because I had had enough and wanted a second option on how to do data analysis. Then the R package dplyr was released in 2013, alleviating most annoyances I had with R. dplyr/ggplot2 alone are strong reasons to stick with the R ecosystem. (not that Python is bad/worse; as I mention at the post, both ecosystems are worth knowing)" -- [38]
- "I use both (((R and Python))), but R for interactive analysis and reporting, Python for data transformations (ETL). While the syntax of Python is "cleaner" for backend scripts, R feels more straightforward when working with dataframes (dplyr) resulting in things to report on. The syntax for ggplot2 fits the same category. As much as having one languages for both categories would be nice, using both today seems like a better option. " -- [39]
- "...Python is way, way better for text. And I say that as a long-time R user. R really doesn't like things that can't be represented as datasets. " [40]
- "Yes a couple of years ago I did this project where I needed to get a ton of analysis done, typically with plots and tables as output. Did this in python and it was this huge mess with pandas and matplotlib. Re-did it in R with data.table and ggplot2 and it was just ridiculously easier, and I could expand upon the code much more easily, plus the output was much prettier."
- "If I need to run a quick analysis on a dataset, I'm grabbing R 9/10 times. If I'm building a production pipeline, I'm using Python 9/10 times." [41]
- "...this kind of code ((replying to a comment about 'answering questions in a rapid, interactive way')) is what R is optimized for, not general purpose programming (even though it can totally do it)." [42]
- "I think R-Studio (an R based IDE that turns it kinda into a more excel like experience) where you can inspect the data in memory (including matrix data) and graph making is where it really helps bring people into the R language. And with a set of instructions anyone can go load the analysis packages and do their data analysis. Compare this to python, where they have to go the unix shell set up the environment, load the libraries. When they come back reset everything and get back to where they started." [43]
- "...there is litterally no equivalent to dplyr and ggplot2 in Python. Those alone can make a huge difference in how many lines you need to write to do something." [44]
- "ggplot2 has plotnine (http://plotnine.readthedocs.io) which has a nearly identical API. I've found though it's not perfect, you can get closer to dplyr with JS-style method chaining on Pandas."
- "I've recently used plotnine, and it's been a relatively good experience, but Pandas is absolute garbage compared to the tidyverse, API-wise."
- "I tried plotnine before and its far from covering everything ggplot can do. And chaining on pandas can make things unreadable compared to dplyr."
- "Non-Computer Scientists seem to have a much easier time with Python than R, anecdotally. I think the reason is that R is not just a badly designed language, but in particular its design is inconsistent. That’s as confusing to newcomers as it is to people who care about PL design. I used R for almost a decade. Last year I switched to Python and Jupyter, never looked back. Can’t recommend the switch highly enough. R has great stats packages, but struggling with the language is just not worth it." [45]
- "R may not be the most "beautiful" language in a general perspective, but it certainly is more beautiful than Python when it comes to actual data analysis. There is nothing in R that is as ugly as even the best implemented pandas, numpy, and matplotlib code. All of the options in Python, which is generally pointed to as the "superior" language to R, feel tacked on and hackish." [46]
- "At any rate, at least it's not Pandas and matplotlib... " [47]
- "I think that Python is probably winning because being a decent language gives you a decent escape hatch, whereas no amount of great libraries can save you from having to go through the bizarro language. That said, R may be bizarro, but at least, once you learn it, it's predictable. Whereas I'm not sure even Pandas really knows whether a given call to .loc will copy or refer to the original data." [48]
- "I learned R coming from Java, Node, PHP and Python and I love it !!! It is awful as an application development programming language, but it was never designed for that purpose. It was designed for STATISTICS. Try to achieve advanced statistics with your traditional software engineer's preferred language and see which language you hate then. The only tricky R concepts to learn for newbies are: recycling, formulas and vectorized functions. Add RevoScaleR? to R and it kicks major ass when dealing with big data manipulation. Oh yes, big time !!!" [49]
- "R is much more lisp-like than python: more functional, more emphasis on DSLs and metaprogramming, and the community is far more inviting to New comers: reminds me a lot of Racket in that regards." [50]
- " Having used Python, JSL, Julia, R and Matlab; I agree with most of the things in R. R is an extremely ugly language. It seems to be created by people who wear capris and uggs (both at the same time). But, R has incredible packages, especially the work done by Hadley Wickam. ggplot2 is beautiful. It is utterly gorgeous. It is what Ted Baker is to the capri guys that designed the language itself." [51]
- "My personal hack to deal with the unbearable ugliness of R is to use Rpy2 and call R packages from Python --- at least writing some boilerplate code in Python makes me happier than having to write in R." [52]
- "Second this. Pandas / Numpy / Numba until I hand off to ggplot, lmer, or whatever specialized R package." [53]
- "I am very clear I am an R "consumer" not an R developer. But, at this point, absent Shiny and a gui, I think that Python and Numpy has as much to offer me basically. Some people say the syntax is FP friendly. I have been trying to learn FP in Haskell and I think R is about the worst notation you could invent to sell FP." [54]
- "I use R most of the time and I find R notebooks very data exploration friendly. It makes it easy to back and forth just like Jupyter notebook. Producing HTML files from Rmarkdown files is also analysis friendly. 99% of the time I use tidyverse with no noticeable impact on the performance. For that occasional 1%, I must admit datatable package works out really well. tidyverse pipes are so unixy that makes it easy to transition to command such as cut, head, sort and column if needed without any mental contortion. I have used Python occasionally and with method chaining, it can almost simulate the "dplyr" like syntax. However, it is hard to find some obscure statistical test out of the box which is easy in R." [55]
- "Great guide. R syntax is awful, no matter how you slice it and would have been the tool of choice before Python could walk. Still, it is very powerful and great to have around." [56]
- "Every error message ever written in R: "Error". Hmmmmm, okay after 20 minutes of squinting at the script I see there's a lowercase letter at the header name of the last column somewhere in the middle of the 200 lines...now we can move on to debugging the next "Error"..." [57]
- "I’d rather do math in a general-purpose language than try to do general-purpose programming in a math language." [58]
- "I don't use R to write programs. I wrangle data, run analysis, and plot results. It has a great number of solid stat packages for mixed modeling, clustering, ordination, etc. That is why I use R." [59]
Lists and tours of libraries:
- http://multithreaded.stitchfix.com/blog/2015/03/17/grammar-of-data-science/
- http://blog.rstudio.org/2014/07/22/introducing-tidyr/
- https://news.ycombinator.com/item?id=10388331
- http://www.slideshare.net/wesm/dataframes-the-good-bad-and-ugly defines the 'Hadley stack' as dplyr, tidry, ggplot2; [60] adds testthat
- https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
- http://dtkaplan.github.io/USCOTS-2015/DataScienceWorkshop/welcome.html#6 mentions tidyr, dplyr, ggplot2, shiny, knitr, rmarkdown
- "I can't stand the non-standard evaluation of the tidyverse. It works great for writing one-off scripts, but as soon as you start trying to put it into functions or your own package it's just not worth the pain of quosures and the tidyeval nonsense that changes every 6 months. " [61]
- "I used to feel similarly, but I think it's much more stable than it was even a year ago; `!!`, `enquo`, and `:=` is good enough for the vast majority of users who want to write their own NSE functions now." [62]
- "There are some highly productive researchers who have taken the time to write well constructed and very useable packages in R, married with meticulous documentation. The same individuals or groups of 1-3 people have also maintained said packages for 5 years or more, and regularly respond in person to queries. Two examples are ggplot2 and limma." [63]
Gotchas:
- if you select a column from an empty dataframe without drop=FALSE, rather than returning an empty column, it returns a NULL [64]
- 'scalars' are just 1-element vectors
- "is.vector() does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names. Use is.atomic(x)
is.list(x) to test if an object is actually a vector." [65] |
- the 'c' atomic vector constructor always produces flat vectors, even if you nest them [66]
- "the pain of needing to specify `stringsAsFactors=FALSE` is solved in the tibble package by setting a sensible default." [67]
- "..here's a tongue in cheek summary of my experience with R. I write a script. It doesn't work. I don't know why. I look at the error message, and then google for 30 minutes to understand what it really means - which parts of the code broke, why, how to fix them. Because none of the 3 things (which, why, how) is easy to get to. OK, I fix it, having learned something new (like that there are infinite special cases with almost any functions). I commit it to repo, go for coffee. In the afternoon, a colleague asks how to run that code. Well, it was a simple script, half a page, what's the problem? I take a look, and on their machine it doesn't run. We don't know why. An hour later we discover she has some R profile file with a setting that changes behavior of some standard library... and she also has different encoding set as default, and so on, and so forth... whatever. I don't know why runtime environment encoding changes behavior of code that only deals with numbers, but hey! It's interesting at least. We fix it, we are happy. A few days later I run the script again. It works. The result doesn't look right though. It's mostly zeroes. Hmm. I run it a few more times, playing around with input, trying to figure out what's up. OK, after a few minutes I realize there's lots of red color that flashes on running the script on my screen - just so fast I barely see it. It turns out half the code isn't really running, the script just ignores it though (errors do NOT stop the code from running), and keeps going. It produces partial output happily announcing it finished. That is the most serious mindfuck. Everything is OK, says the prompt, here's your 1 megabyte result of the calculation, oh, just don't look at the numbers, because I havent' really run any of the code... I couldn't find one of the functions. I sit there wondering. Which is worse: the fact that every time I try launching the script something else is happening, or the fact that the runtime environment by default will return garbage with NO warning at the end (which is the only thing you see on screen) but with a million warnings in between (which you won't see unless you have really good reflexes...). Which is worse? I decided at some point, that I want a language to fail, and to always give me the same result. An error, an exception, this should kill the program and shout as loud as possible "Won't give you anything". Also I want code that ran yesterday to run today, and to run on my colleague's machine, and on a newer version of R. This was never our experience. " [68]
- " Sounds like you aren't running your scripts as scripts. If you source a script or run it via Rscript it will halt when it hits a failure (unless you've changed a default). Copying and pasting in to the REPL will hide errors like you describe. The other part is that it sounds like you don't have a standardized R environment. I admit that R's tooling there isn't the best, but there are options, e.g. {packrat} & {lockbox}... or better yet a Docker image. " [69]
- "...some authorities prefer <- for assignment but I’m increasingly convinced that they are wrong; = is always safer. R doesn’t allow expressions of the form if (a = testSomething()) but does allow if (a <- testSomething())."..."you can intend to type if(b < -9) and instead type if(b <- 9). The latter always evaluates to true and assigns the value you're trying to compare with to your variable. This can be extremely difficult to catch and detect..." [70] and [71]
- "I use a lot of R, and like many aspects of it. But the fact that `f(stop("Hi!"))` may or may not throw an error depending on the internals of `f` is a little maddening. (And there are tons of similar issues.)" [72]
- "With regards to matrices, the fact that R has no scalars is exactly the problem. That’s another way of saying it doesn’t distinguish between 0 and 1 dimension. The consequence of that is that operations that increase the dimension result in confusion between N and N+1 dimensions in general! Several years ago I fixed like 10 bugs in a single piece of code related to that issue. It would work for dimension > 1, but fail for dimension == 1 because a 3x1 matrix was fundamentally confused with a vector of length 3 by the language and standard library. I dug up the thread on that here: https://old.reddit.com/r/oilshell/comments/a2atkg/what_is_a_data_frame_in_python_r_and_sql/eazlrl2/ I can probably come up with a concrete example, but bottom line is that I use Python for linear algebra (which is rarely in any case), and R for data manipulation. edit: From reading over that thread again, it is the simplify= issue. This gotcha doesn’t appear in the original post, but I believe it will confuse anyone who has ever used Matlab, NumPy?, or Julia. I can’t say for sure but I’m pretty sure none of them have that issue. But I don’t think it is limited to simplify as far as I remember – that’s just a symptom of the problem with the data model itself. " [73]
Types [74]:
- homogeneous:
- atomic vector (1d) (note: 'scalars' are just 1-element vectors in R)
- logical, integer, double, character (string), complex, raw
- matrix (2d),
- array (n-d)
- heterogeneous:
- list (1d),
- data frame (2d)
- nullable types ("na"); 'na's are typed
R internals
Julia
Julia tutorials
Julia features
"
- Multiple dispatch: providing ability to define function behavior across many combinations of argument types
- Dynamic type system: types for documentation, optimization, and dispatch
- Good performance, approaching that of statically-compiled languages like C
- Built-in package manager
- Lisp-like macros and other metaprogramming facilities
- Call Python functions: use the PyCall? package
- Call C functions directly: no wrappers or special APIs
- Powerful shell-like capabilities for managing other processes
- Designed for parallelism and distributed computation
- Coroutines: lightweight “green” threading
- User-defined types are as fast and compact as built-ins
- Automatic generation of efficient, specialized code for different argument types
- Elegant and extensible conversions and promotions for numeric and other types
- Efficient support for Unicode, including but not limited to UTF-8 " [75]
Julia: Dynamism and Performance Reconciled by Design section 4.2.1 shows how multiple dispatch enables a library to compute the derivative of another program without modifying that other program to be aware of the derivative-computing library.
JuliaLang: The Ingredients for a Composable Programming Language
Julia retrospectives
Julia best practices
Julia opinions
- "There is an obvious reason to choose Julia: it's faster than other scripting languages, allowing you to have the rapid development of Python/MATLAB/R while producing code that is as fast as C/Fortran...That sounds like it violates the No-Free-Lunch heuristic. Is there really nothing lost?...Julia is fast because of its design decisions. The core design decision, type-stability through specialization via multiple-dispatch....Type stability is the idea that there is only 1 possible type which can be outputtted from a method...There are some "lunches lost" that we will have to understand....The upside is that Julia's functions, when type stable, are essentially C/Fortran functions. Thus ^ (exponentiation) is fast. However, ^(::Int64,::Int64) is type-stable, so what type should it output? 2^5 ((yields)) 32, ((but)) 2^-5 ((yields)) "DomainError?: Cannot raise an integer x to a negative power -n. Make x a float by adding a zero decimal (e.g. 2.0^-n instead of 2^-n), or write 1/x^n, float(x)^-n, or (x1)^-n." [76] "
- "...the choice of unbalanced ends for blocks and ::s for attaching types makes the Julia code appear unnecessarily noisy IMHO. It is often said that code is read more than it is written and in my opinion Julia has definitely room for improvement here. The expression syntax is more reasonable, but there is still some unorthodox choice of operators, punctuators and the syntax for multiline comments that creates WTF moments every now and then (who has come up with this stuff: #= \ $ ?)." [77]
- "One-based indexing is another questionable design decision. While it may be convenient in some cases, it adds a source of mistakes and extra work when interoperating with popular programming languages that all (surprise!) use 0-based indexing..." [78]
- "And the last issue that I want to mention in this section is apidocs. The standard documentation system is a step back even compared to Doxygen, not to mention Sphinx. Instead of using semantic markup it relies on rudimentary Markdown-based format with focus on presentation. Apart from obvious limitations of Markdown, this makes documentation of heterogeneous projects more difficult." [79]
- "JNA- and ctypes-like FFI is convenient, there is no doubt about it. But making it the default way to interface with native APIs is a major safety issue. C and C++ have headers for a reason and redeclaring everything by hand is not only time-consuming, but also error-prone. A little mistake in your ccall and you just happily segfaulted, and that’s an optimistic scenario. And now try to correctly wrap strerror_r or similar...So this “feature” instead of eliminating boilerplate eliminates type checking. In fact, there is more boilerplate per function in ccall than in, say, pybind11. This is another area where Python and even Java with their C API win. There is a way to abuse FFI there too, but at least it’s not actively encouraged." [80]
- "Another area where Julia is lacking at the moment is libraries, including the standard library. As has been pointed out elsewhere “Base APIs outside of the niche Julia targets often don’t make sense” and the general-purpose APIs are somewhat limited." [81]
- "For example, text formatting is one of the most basic and commonly used language facilities one could probably think of and Julia is even behind C++98 there. The standard library provides @printf and @sprintf but they are not extensible. You can’t even make them format a complex number. There is a rudimentary string interpolation, but in its current form it only seems to be useful for very basic formatting." [82]
- "startup time...a trivial hello world program in Julia runs ~27x slower than Python’s version and ~187x slower than the one in C....it’s not just scripts, Julia’s REPL which should ideally be optimized for responsiveness takes long to start and has noticeable JIT (?) lags...In addition to that, Julia programs have excessive memory consumption...Possible reason for this is the use of LLVM for JIT. LLVM is great as a compiler backend for statically-typed compiled languages, but it has been known not to work equally well in the context of dynamic languages. Unladen Swallow and a recent migration of WebKit? away from LLVM are notable examples." [83]
- "The libraries for unit testing are also very basic, at least compared to the ones in C++ and Java. FactCheck? is arguably the most popular choice but, apart from the weird API, it is quite limited and hardly developed any more." [84]
- "To summarize, I see the following problems with the language and its infrastructure right now:
Performance issues including long startup time and JIT lags
Somewhat obscure syntax and problems with interoperability with other languages
Poor text formatting facilities in the language and lack of good unit testing frameworks
Unsafe interface to native APIs by default
Unnecessarily complicated codebase and insufficient attention to bug fixing
Despite all this, I think the language can find its niche as an open-source alternative to MATLAB because its syntax might be appealing to MATLAB users. I doubt it can seriously challenge Python as the de-facto standard for numerical computing." [85]
- "Here’s a language that gives near-C performance that feels like Python or Ruby with optional type annotations (that you can feed to one of two static analysis tools) that has good support for macros plus decent-ish support for FP, plus a lot more. What’s not to like? I’m mostly not going to talk about how great Julia is, though, because you can find plenty of blog posts that do that all over the internet." [86]
- "It’s not unusual to run into bugs when using a young language, but Julia has more than its share of bugs for something at its level of maturity. If you look at the test process, that’s basically inevitable. As far as I can tell, FactCheck? is the most commonly used thing resembling a modern test framework, and it’s barely used. Until quite recently, it was unmaintained and broken, but even now the vast majority of tests are written using @test, which is basically an assert. It’s theoretically possible to write good tests by having a file full of test code and asserts. But in practice, anyone who’s doing that isn’t serious about testing and isn’t going to write good tests. Not only are existing tests not very good, most things aren’t tested at all." [87]
- "Something that goes hand-in-hand with the level of testing on most Julia packages (and the language itself) is the lack of a good story for error handling. Although you can easily use Nullable (the Julia equivalent of Some/None) or error codes in Julia, the most common idiom is to use exceptions. And if you use things in Base, like arrays or /, you’re stuck with exceptions. I’m not a fan, but that’s fine – plenty of reliable software uses exceptions for error handling...The problem is that because the niche Julia occupies doesn’t care2 about error handling, it’s extremely difficult to write a robust Julia program....There are problems at multiple levels....If I’m writing something I’d like to be robust, I really want function documentation to include all exceptions the function might throw. Not only do the Julia docs not have that, it’s common to call some function and get a random exception that has to do with an implementation detail and nothing to do with the API interface....Another problem is that catching exceptions doesn’t work (sometimes, at random)." [88]
- "Since we’re broadly on the topic of APIs, error conditions aren’t the only place where the Base API leaves something to be desired. Conventions are inconsistent in many ways, from function naming to the order of arguments. Some methods on collections take the collection as the first argument and some don’t (e.g., replace takes the string first and the regex second, whereas match takes the regex first and the string second)." [89]
- "More generally, Base APIs outside of the niche Julia targets often don’t make sense. There are too many examples to list them all, but consider this one: the UDP interface throws an exception on a partial packet. This is really strange and also unhelpful. Multiple people stated that on this issue but the devs decided to throw the exception anyway. The Julia implementers have great intuition when it comes to linear algebra and other areas they’re familiar with. But they’re only human and their intuition isn’t so great in areas they’re not familiar with. The problem is that they go with their intuition anyway, even in the face of comments about how that might not be the best idea." [90]
- "Another thing that’s an issue for me is that I’m not in the audience the package manager was designed for. It’s backed by git in a clever way that lets people do all sorts of things I never do. The result of all that is that it needs to do git status on each package when I run Pkg.status(), which makes it horribly slow; most other Pkg operations I care about are also slow for a similar reason. That might be ok if it had the feature I most wanted, which is the ability to specify exact versions of packages and have multiple, conflicting, versions of packages installed" [91]
- "There’s lots of friction that keeps people from contributing to Julia. The build is often broken or has failing tests. When I polled Travis CI stats for languages on GitHub?, Julia was basically tied for last in uptime. This isn’t just a statistical curiosity: the first time I tried to fix something, the build was non-deterministically broken for the better part of a week because someone checked bad code directly into master without review. I spent maybe a week fixing a few things and then took a break. The next time I came back to fix something, tests were failing for a day because of another bad check-in and I gave up on the idea of fixing bugs." [92]
- " That tests fail so often is even worse than it sounds when you take into account the poor test coverage. And even when the build is “working”, it uses recursive makefiles, and often fails with a message telling you that you need to run make clean and build again, which takes half an hour. When you do so, it often fails with a message telling you that you need to make clean all and build again, with takes an hour. And then there’s some chance that will fail and you’ll have to manually clean out deps and build again, which takes even longer. And that’s the good case! The bad case is when the build fails non-deterministically. These are well-known problems that occur when using recursive make, described in Recursive Make Considered Harmful circa 1997." [93]
- "...the biggest barrier to contributing to core Julia...is that the vast majority of the core code is written with no markers of intent (comments, meaningful variable names, asserts, meaningful function names, explanations of short variable or function names, design docs, etc.)." [94]
- "The metaprogramming in julia is so good I wrote a verilog DSL that transpiles specially written julia into compilable and verifiable verilog - in 3 days." [95]
- I'm a quite happy Julia user, however I feel there are still some warts in the language that should have warranted a bit more time before banging 1.0 on the badge. Exception handling in julia is poor, which reminds me of how exceptions are (not/poorly) handled in R. Code can trap exceptions, but not directly by type as you _would_ expect. Instead, the user is left to check the type of the exception in the catch block. Aside for creating verbose blocks of boilerplate at every catch, it's very error prone. Very few packages do it right, and like in R, exceptions either blow up in your face or they simply fail silently as the exception is handled incorrectly upstream by being too broad. Errors, warnings and notices are also often written as if the only use-case scenario is the user watching the output interactively. Like with R, it's possible but quite cumbersome to consistently fetch the output of a julia program and be certain that "stdout" contains only what >you< printed. As I use julia also a general-purpose language to replace python, I feel that julia a bit too biased toward interactive usage at times. That being said, I do love multiple dispatch, and julia overall has one of the most pragmatic implementations I've come across over time, which also makes me forget that I don't really like 1-based array indexes." [96]
- "Pkg.generate: brilliant" [97] from [98]
- "Readability...unicode operators? Brilliant...sometimes!" [99] from [100]
- "Array indexing...Arbitrary indexing? Brilliant!... 1-based indexing…eww!" [101] from [102]
- "REPL • Shell? Help? C++?!? Brilliant! • REPL as exploration vs REPL as dev tool • workspace()? Brilliant! • method redefinition… boo! (Infamous #265)" [103] from [104]
- The accelerating adoption of Julia provides a long example of the sort of composability that multiple dispatch enables
- https://news.ycombinator.com/item?id=24864087
- "everytime I want to do anything with Julia, I need to wait upto a minute or two. Gadfly takes ages to pre-compile. Closed your terminal session? You're outta luck. Wait 2 mins please. Launch a notebook and want to start working on something? Wait 40 seconds. Wanna check something quick in Julia REPL session? Wait 7 seconds. Let me run this Julia script, wait 55 seconds." [105]
lenticular 42 days ago [-]
It's extremely expressive. Notably, Julia is homoiconic, with full lisp-style macros. It also has multiple dispatch, which is a far more general technique that OO single-dispatch. This makes it very easy to define modular interfaces that work much like statically-typed type classes in Haskell. This allows you, for example, to define a custom matrix type for your bespoke sparse matrix layout and have it work seamlessly with existing linear algebra types.
I've done a lot of work in both Python with Scipy/Numpy, and Julia. Python is painfully inexpressive in comparison. Not only this, but Julia has excellent type inference. Combined with the JIT, this makes it very fast. Inner numerical loops can be nearly as fast as C/Fortran.
Expanding on the macro system, this has allowed things like libraries that give easy support for GPU programming, fast automatic differentiation, seamless Python interop, etc.
tombert 42 days ago [-]
I'm not sure how I feel about multi-dispatch...I've had a few headaches chasing down problems with multimethods in Clojure...I'd have to try using Julia full-time to see how it feels.
I was unaware that Julia was homoiconic...I'm somewhat of a Lisp fanboy so I might need to give the language another chance.
lenticular 42 days ago [-]
There's pretty big differences in usage between multimethods in Clojure and Julia. I've used both a decent amount. All functions in Julia are multimethods by default. If you don't use type annotations, a new method will be generated whenever you call the function with new argument types. This explicit type specialization is a very important part of why Julia can have such a consistently fast JIT despite its dynamicity.
Errors from missing or conflicting methods tend to not happen much in practice.
https://docs.julialang.org/en/v0.7.0/manual/methods/
nicoburns 42 days ago [-]
> If you don't use type annotations, a new method will be generated whenever you call the function with new argument types.
Damn. That's a pretty clever trade off between dynamic and static types.
vanderZwan 42 days ago [-]
I vaguely recall a talk by Stefan Karpinsky where he mentions meeting one of the big names in compiler land (working on V8 or something) and they said Julia's JIT is just a lazily evaluated AOT compiler, and as a result much simpler than the JITs typically seen in other languages.
tombert 42 days ago [-]
Forgive a bit of ignorance here, but that doesn't sound terribly different than overloading functions in C++. Am I way off on that?
KenoFischer? 42 days ago [-]
The difference is run-time (semantically) vs compile time. If you had overloading at runtime in C++, you wouldn't need virtual functions or any of the thousands of OO "patterns" (visitor, factory, etc.), that are working around the lack of this capability. " -- [106] todo digest this
"
kgwgk 42 days ago [-]
> Julia is homoiconic
It depends on what you understand by homoiconic:
https://stackoverflow.com/questions/31733766/in-what-sense-a...
eigenspace 42 days ago [-]
This is why the language creators usually avoid using the word 'homoiconic' because every time one uses that word there's a finite probability of being bogged down in an incredibly uninteresting semantic argument.
Instead, people prefer to say that julia code is just another (tree-like) data-structure in the language and it can be manipulated at runtime with functions or compile time with macros or at parse time with string macros and now with Cassette.jl[1] we can even manipulate the form of code that has already been written and shipped by other packages all with first class metaprogamming tools. It seems to me that even if Julia is not 'truly homoiconic', that we seem to get the touted benefits of homoiconicity to the point that it seems like an unimportant distinction.
[1] https://github.com/jrevels/Cassette.jl
Athas 41 days ago [-]
Then why not just say 'Julia has macros'? Lightly perusing the description of Julia's features, that seems like a clear way of expressing what it is.
(I also vaguely recall Julia describing its type system as "dependent" in way that goes against convention. Maybe they just liked controversy in the early days!)
eigenspace 41 days ago [-]
What I just said is why Julia people don’t tend to call it homoiconic or dependantly typed. It lead to so many semantic arguments that most just talk about actual features such as macros and various multiple dispatch features instead of using words like homoiconicity and dependant typing.
zem 41 days ago [-]
"macros" are unfortunately used by C and lisp to describe two different things, and both usages are as widely popular as their parent languages (i.e. very). "
-- [107] todo digest this
- https://www.evanmiller.org/why-im-betting-on-julia.html
- https://arstechnica.com/science/2020/10/the-unreasonable-effectiveness-of-the-julia-programming-language/
- "Julia has an expressive but easy-to-read syntax, especially for manipulating arrays." [108]
- "It provides a smooth road to parallel processing of numerical algorithms." [109]
- "...has the benefit of having been designed in the age of Unicode. This, along with several other syntax features, lets math look more like math than in any other programming language." [110]
- multiple dispatch:
- "Clearly, multiple dispatch, or some other way around the expression problem, is necessary for the kind of fluent composability that I’ve described above—but it is not sufficient. Julia has enjoyed an explosive degree of uptake in the scientific community because it combines this feature with several others that make it very attractive to numericists. It leads to fast code out of the box, without needing to jump through hoops. Stanford professor Mykel Kochenderfer, who used Julia to design a system for aircraft collision avoidance that became an international standard, told me that Julia “is high-level and interpretable by a human but it also ran as fast as my highly-optimized C++ code.”" [111]
- "Julia’s unique combination of convenient syntax with uncompromising performance...Although Julia’s solution to this problem attracted scientists and others to the language, this is not the reason for the newfound excitement around the platform. There is something else....the magical (this word cropped up more than once) power of Julia to facilitate collaboration and code reuse...The key to these powers is in Julia’s solution to a different old conundrum, this time from computer science--the expression problem...Multiple dispatch is Julia’s solution to the expression problem. It is also the central organizing principle of the language, which, therefore, is neither object oriented nor functional. Multiple dispatch is something more powerful and general than either the functional or object oriented solution. And multiple dispatch is the secret sauce that gives Julia the power to make simple what is so difficult in most other languages: the free and direct mixing and matching of libraries to do things not imagined by the people who wrote them." [112]
- "Julia is designed around multiple dispatch, which in other languages is opt-in and comes with performance penalties" [113]
- "The multiple dispatch design was intended to make the language flexible and to be able to express mathematical ideas naturally; but even its designers were surprised by the sheer amount of code reuse in the community that resulted." [114]
- "although the multiple dispatch design and the speed of generated code are usually discussed as separate attributes of the language, some of the speed is actually an effect of Julia’s strategy for function dispatch.)" [115]
- "The Julia programming language aims to decrease the gap between productivity and performance languages. On one hand, it provides productivity features like dynamic typing, garbage collection, and multiple dispatch. On the other, it has a type-specializing just-in-time compiler and lets programmers control the layout of data structure in memory. Julia, therefore, promises scientific programmers the ease of a productivity language at the speed of a performance language." [116]
- "Julia’s design was carefully tailored so that a very small team of language implementers could create an efficient compiler." [117]
- slow startup time [118]
- " I teach a graduate course in optimization methods for machine learning and engineering [119] [120]. Julia is just perfect for teaching numerical algorithms. First, it removes the typical numpy syntax boilerplate. Due to its conciseness, Julia has mostly replaced showing pseudo-code on my slides. It can be just as concise / readable; and on top the students immeditaly get the "real thing" they can plug into Jupyter notebooks for the exercises. Second, you get C-like speed. And that counts for numerical algorithms. Third, the type system and method dispatch of Julia is very powerful for scientific programming. It allows for composition of ideas in ways I couldn't imagine before seeing it in action. For example, in the optimization course, we develop a mimimalistic implementation of Automatic Differentiation on a single slide. And that can be applied to virtually all Julia functions and combined with code from preexisting Julia libraries. " [121]
- "This. Julia's combination of human-readable pseudocode-like code and speed makes it perfect for these applications, and seems to be driving adoption." [122]
- "Also not seen before degree of code reuse." [123]
- " > Due to its conciseness, Julia has mostly replaced showing pseudo-code on my slides. This benefit is really underappreciated IMO — for a lot of "science" applications, the core part of the program should be readable by people who don't program in the language. In research papers, by people who want to understand the fine details of your algorithm, for example. Julia gets closer to "executable pseudocode" than I would have thought possible. " [124]
- "...its design has hit on something that has made a major step forwards in terms of our ability to achieve code reuse. It is actually the case in Julia that you can take generic algorithms that were written by one person and custom types that were written by other people and just use them together efficiently and effectively." Stefan Karpinski
- Q: "Interesting. Can you give an example of generic algorithms plus custom types, in practice?"
- "DiffEq? on ForwardDiff? Dual numbers to calculate sensitivity via forward mode AD. DiffEq? on Tracker's TrackedArray? to calculate sensitivity via reverse mode AD. Measurements.jl's numbers that track measurement error input any algorithm to compute the transformed measurement error after the algorithm is applied. NamedDims?.jl + Flux.jl to give everything that PyTorch?'s awesome Named Tensors feature gives. " [125]
- "the combination of unitful and diffeq https://tutorials.juliadiffeq.org/html/type_handling/03-unitful.html" [126]
- "...the core feature that allows this stuff is duck typing, but by itself it's not enough. Your notduck had to not only quack, but other animals have to look at it and act like it's a duck sometimes and like a notduck when you want it to do more than a duck. Multiple Dispatch (plus parametric subtyping) allows you to trivially define both notduck + A (notduck.+(A) in OOP languages) and A + notduck (extending A whatever A is) and it's really fast. That allows for the core Julia concept of specialization, easily customizing the particular behavior of any agent at any point to get both the common behavior right and the extended behavior. For static languages you can implement part of it with, for example, interfaces (you'll face the same restrictions if the language is single dispatch), but even if you can extend the interface freely for already existing objects, there must be an agreement between the multiple packages to comply to the same interfaces (and you might either end with tons of interfaces since there are tons of possible behaviors for each entity and purpose or giant interfaces to fit all). In Julia you can use specialization to surgically smooth the integration between two packages that had no knowledge of each other and didn't even decide to comply to any (informal) interface (which do exist in Julia, like the Julia Array and Tables.jl interfaces). " [127]
- https://discourse.julialang.org/t/in-as-few-lines-as-possible-describe-why-you-love-julia/
- https://discourse.julialang.org/t/in-as-few-lines-as-possible-describe-why-you-love-julia/33179/9
- https://discourse.julialang.org/t/in-as-few-lines-as-possible-describe-why-you-love-julia/33179/24?u=viralbshah
- "There are parts of Julia I really like but it has some problems.
- Multiple dispatch is an odd design pattern that seems to over complicate things. I know there are people that love it and claim it’s better, but after working with it for some time I just want a struct with methods. It’s much easier to reason about.
- The packaging is cumbersome. I understand their reasoning behind it but in practice it’s just more of a pain than denoting functions as public one way or another.
- The tooling is poor. I work with Go for my day job and it’s toolchain is an absolute pleasure. The Julia toolchain isn’t in the same arena.
- The JIT is slowwww to boot. I was amazed the first time running a Julia program how slow it was. You could practically compile it faster.
- Editor support has never been great.
- As others have mentioned type checks don’t go deep enough I think it has some neat ideas and in certain scientific arenas it will be useful, but IMO they need to focus a bit more on making it a better general purpose language." [128]
- "I've been using Julia along with python and Pytorch, not yet for machine learning until flux is more mature but for NLP scripts, and I have to say that I'm starting to like it. Multiple dispatch, linear algebra and numpy built in, dynamic language but with optional types, user defined types, etc." [129]
- " Julia is great. It’s significantly simpler than Python while also being much more expressive. It’s too bad the typing is only for dispatch, but hopefully someone will write a typechecker someday. I’ve found it surprisingly refreshing to not have to think about classes and just add functions on objects wherever I want. Some languages solve this with monkey patching (which is bad), others like Scala with an extension class (reasonable, but you still don’t get access to private properties), but the Julia approach is cleaner. I wouldn’t use Julia for a non-scientific computing app as I don’t think it’s suitable, but for anything data science related, it’s great! And with the Python interop, I don’t really think there’s any reason not to use Julia for your next data science project. I suspect that over the next 5 years Python will no longer be used for these applications at all. " [130]
- " I've been using Julia for non-scientific computing programs for almost 5 years now, and (especially now that it is stable since the 1.0 release) have found it well suited for general programming as well. Having a language that is easy to write (like Python), runs fast (like C/C++), and incredibly flexible & expressive (like Lisp) makes programming fun again! " Scott P. Jones
- " Julia is a language I really wanted to like, and to a certain extent I do. However after spending some time working with it and hanging out on the Discourse channels (they certainly have a very friendly and open community, which I think is a big plus), I've come to the tentative conclusion that its application domains are going to be more limited than I would have hoped. This article hits on some of the issues that the community tends to see as advantages but that I think will prove limiting in the long run. > Missing features like: > Weak conventions about namespace pollution > Never got around to making it easy to use local modules, outside of packages > A type system that can’t be used to check correctness These are some of my biggest gripes about Julia, especially the last two. To these I would add:
- Lack of support for formally defined interfaces.
- Lack of support for implementation inheritance. Together with Julia's many strengths I think these design choices and community philosophy lead to a language that is very good for small scale and experimental work but will have major issues scaling to very complex systems development projects and will be ill-suited to mission critical applications. In short I think Julia may be a great language for prototyping an object detection algorithm, but I wouldn't want to use it to develop the control system for a self-driving car. Unfortunately this means that Julia probably isn't really going to solve the "2 language problem" because in most cases you're still going to need to rewrite your prototypes in a different language just like you would previously in going from, for example, a Matlab prototype to a C++ system in production. " [131]
- " You touch upon some interesting pain points. I really like Julia and working with it is a pleasure. Except the Module system, which feels unnecessarily arcane. I'm happy to be educated on why, but it seems to successfully combine the awkwardness of C-style #include with the mess of a free-form module system. The end result is a Frankenstein monster where technically everything is possible, everything could be included anywhere, there are no boundaries or even conventions. It makes for a frustrating experience for a newbie. Say you have a package, and inside is a file called `xyz.jl`. You open the file and it defines a module called Xyz. But this tells you absolutely nothing about where in the package Xyz will appear. It could be included somewhere deep in the hierarchy, or it could be a submodule. It could be included multiple times in multiple places! That's bad design for sure, but the language places no boundaries on you. You open another file `abc.jl`, and see no modules at all, just a bunch of functions, which in turn call other functions that are defined God knows where. A julia file does not have to contain any information about where the symbols it's using come from, since it will be just pasted in verbatim to some location somewhere. The whole module system feels like one big spaghetti of spooky action at a distance. It's a shame too, because the rest of the language is very neat. Once one gets over the hurdle of the modules, it is possible to establish conventions to bring some sanity in there, but it's a hurdle that many people will probably not want to deal with. ... Languages like Python, Rust, C# or even Java have module systems that I find are more restrictive, but much easier to follow. You always have the pertinent information at hand. Each file containing code clearly tells you two crucial pieces of information: 1. Where the code fits in the greater picture 2. Where the dependencies of the code in a file come from Python, whose module story is actually pretty poor, is still easier to follow than Julia, because it just matches the file/directory structure. You can reason about the hierarchy of a python library by just navigating the directories. In a normal python project, each file is one module and it's dependencies are clearly specified as imports. Rust relies on the file system as well, and much better defined rules than Python. I find this great, because the file system is already hierarchical and we are used to the way it works. When I open a file in a Rust project, I know immediately where it fits in the hierarchy - because it is implied from the file system. Rust gives you a bit more flexibility in that you can define submodules in each file. C# & Java qualify the namespaces fully in each file. While the file structure is not as clear anymore at the file system level, a single file contains all the information necessary to determine where the code fits in and where it's dependencies come from. Now let's take Julia: A single module will be often split across multiple files. Since they share a single namespace, the imports happen at the top level where the module is defined and includes all it's source files. When you open a source file, you have zero information where all the functions and data types are coming from (or where they are going for that matter). I see the following pattern systematically emerge in Julia code:
- A function is defined in file A
- File A is `included` in file B, where it forms part of module X
- It is then imported into module Y in file C, but it is not actually used there
- As it is finally used in file D, which is `included` in module Y in file C itself The problem is that there is no link from file A to file B, or module X for that matter. File A could be part of a dozen modules, or zero. Neither is there a link between the usage of the function in file D and where it is coming from. You actually have to find all the places where file D is included, and then check what flavor of the function does each location import. The relationships are established at some indeterminate level in the hierarchy. Again, don't get me wrong, this is just a wart on an otherwise very pleasant language. I wouldn't be complaining if I weren't using it. " [132]
- What's bad about Julia?
- compile time latency
- large memory consumption
- weak static analysis "It lies somewhere in between Python and Rust in terms of static analysis and safety. You can add type annotations to your functions, but the errors still only appear at runtime, and it's generally considered un-idiomatic to use too many type annotations, with good reason."
- the core language is unstable
- "the developer tooling surrounding Julia is also immature...Julia's built-in Test package is barebones, and does not offer setup and teardown of tests, nor the functionality to only run a subset of the full test suite...The editor experience is not great with Julia. It's getting better, but with the foremost Julia IDE developed by a few people in their spare time, it has all the crashes, slowness and instability you would expect....Static analysis is brand new, and feels like it hasn't yet settled into its final form. It also has no IDE integration...There is no common framework for benchmarking and profiling Julia code. In a single session, you may analyze the same function with BenchmarkTools?, @allocated, Profile, JET, JETTest, @code_native and Cthulhu, which each has to be loaded and launched individually."
- the type system works poorly.
- "In Julia, types can be either abstract or concrete. Abstract types are considered "incomplete". They can have subtypes, but they cannot hold any data fields or be instantiated - they are incomplete, after all. Concrete types can be instantiated and may have data, but cannot be subtyped since they are final...You can't extend existing types with data...Abstract interfaces are unenforced and undiscoverable...even in Base Julia, fundamental types like AbstractSet?, AbstractChannel?, Number and AbstractFloat? are just not documented....A few abstract types in Julia are well documented, most notably AbstractArray? and its abstract subtypes, and it's probably no coindidence that Julia's array ecosystem is so good...Here is a fun challenge for anyone who thinks "it can't be that bad": Try to implement a TwoWayDict?, an AbstractDict? where if d[a] = b, then d[b] = a. In Python, which has inheritance, this is trivial. You simply subclass dict, overwrite a handful of its methods, and everything else works. In Julia, you have to define its data layout first - quite a drag, since dictionaries have a complicated structure (remember, you can't inherit data!). The data layout can be solved by creating a type that simply wraps a Dict, but the real pain of the implementation come when you must somehow figure out everything AbstractDict? promises (good luck!) and implement that..."
- subtyping is an all-or-nothing thing. "each type can only have one supertype, and it inherits all of its methods. Often, that turns out to not be what you want: New types often has properties of several interfaces: Perhaps they are set-like, iterable, callable, printable, etc. But no, says Julia, pick one thing. To be fair, "iterable", "callable" and "printable" are so generic and broadly useful they are not implemented using subtyping in Julia - but doesn't that say something? In Rust, these properties are implemented through traits instead. Because each trait is defined independently, each type faces a smorgasbord of possibilities...Julia does have traits, but they're half-baked, not supported on a language level, and haphazardly used. They are usually implemented through multiple dispatch, which is also annoying since it can make it difficult to understand what is actually being called. Julia's broadcasting mechanism, for example, is controlled primarily through traits, and just finding the method ultimately being called is a pain. Also, since so much of Julia's behaviour is controlled through the type of variables instead of traits, people are tempted to use wrapper types if they want type A to be able to behave like type B. But those are a terrible idea, since it only moves the problem and in fact makes it worse: You now have a new wrapper type you need to implement everything for, and even if you do, the wrapper type is now of type B, and doesn't have access to the methods of A! A good example of the subtyping system not working is Julia's standard library LinearAlgebra?. This package uses both wrapper types and traits to try to overcome the limitations of the type system, and suffers from both the workarounds. But an even clearer example of the failure of the type system is its use of big unions, that is, functions whose type signature has arguments of the type "A or B or C or D or E or ...". These typically appear in code when you need to add a method to an object, and then discover that the sets of types you need to implement it for doesn't fit into the type hierarchy as a single supertype. And why would it? Why is it simply assumed that behaviour is strictly monophyletic? Besides being unwieldly, unions are also un-extendable. And even in Base Julia, those unions can get out of control: If you have Julia at hand, try to type in LinearAlgebra?.StridedVecOrMat? and watch the horror. The use of such an abomination is a symptom of an unsolved underlying problem with the type system. The consensus on idiomatic Julia seem to be slowly drifting away from leaning on its type system to specify constraints, and towards ducktyping and traits. "
- the iterator protocol is weird and too hard to use (i can't summarize, just go read it)
- functional programming primitives are not well designed
- "map, filter and split are eager, returning Array...Newer versions of Julia introduced Iterators.map and Iterators.filter which are lazy, but using them means breaking backwards compatibility, and also, you have to use the ugly identifier Iterators"
- "Functional programming functions like map and filter can't take functions. That is, I cannot call map(f) and get a "mapper" function. I usually "solve" this by defining imap(f) = x -> Iterators.map(f, x) in the beginning of my files, but honestly, Julia's iterators should work like this by default."
- "What do you think the method eachline(::String) does? Does it iterate over each line of a string? Haha, no, silly you. It interprets the string as a filename, tries to open the file, and returns an iterator over its lines. What? So, how do you actually iterate over the lines in a string? Well, you have to wrap the string in IO objects first. Yeah."
- "that's another gripe, there is no such type as a Path in Julia - it just uses strings. Why not? I honestly don't know, other than perhaps the Julia devs wanted to get 1.0 out and didn't have time to implement them."
- "But Jakob, you say, don't you know about Takafumi Arakaki's amazing JuliaFolds? ecosystem which reimagines Julia's iterator protocol and functional programming and gives you everything you ask for? Yes I do, and it's the best thing since sliced bread, BUT this basic functionality simply can't be a package. It needs to be in Base Julia. For example, if I use Arakaki's packages to create an "iterator", I can't iterate over it with a normal Julia for loop, because Julia's for loops lower to calls to Base.iterate"
- discussion: https://lobste.rs/s/xivnsc/what_s_bad_about_julia#c_6mooqw
- https://viralinstruction.com/posts/goodjulia/
- https://yuri.is/not-julia/
Julia opinionated comparisons
- "Julia is by far the best programming language for analyzing and processing data...Python has much more mature and stable libraries, and it’s used everywhere. (But I really hope Julia overtakes it in the next couple of years because it’s such a well-designed language.)" [133]
- https://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia
- Figure 6 of Julia: dynamism and performance reconciled by design shows that the Julia implementation cost about 25 person-years of effort to implement, which is much less than PyPy?, which cost about 113 person-years, V8 which cost about 163, and HotSpot? which cost about 200 person-years of effort. Yet, on their benchmarks, Julia outperforms those languages.
- " Julia is a nice language, it's just tough to compete with Python.
- The beginner experience in Julia is still much worse than it is in Python. Stuff that should work intuitively sometimes doesn't, and when you get a cryptic error message, it's difficult to find relevant help online. And when you do find help, some of it is out of date because the language has changed over the past few years.
- You can squeeze a lot of performance out of Python and the ecosystem of libraries is hard to beat.
- Julia has to be way better than Python to give people an incentive to switch. Being just marginally better in some aspects of the language isn't enough. And it's very difficult to be much better than Python especially in useability and ecosystem. " [134]
- "Julia offers a wonderful modular ecosystem. This is in no small part due to a clever design decision of language design of combining type genericism with multiple dispatch. For example, Turing.jl for Bayesian Inference plays well with Flux.jl for Neural Networks which plays well with DifferentialEquations?.jl for ODEs. Basically, everything in pure Julia plays nicely with everything else. An example of how this useful: when neural ODES became more popular a couple of years ago, Julia users had to do almost nothing to implement them and extend them. DifferentialEquations?.jl and Flux.jl already played nicely with each other, and you could just run wild. Meanwhile, in Python-land, there are devs building out ODE solvers built in Tensorflow and Pytorch, doing a load of duplicate work because the frameworks don't allow the same level of genericism. The whole ecosystem is like this. So I've decided to stay with Julia. I'm staying with Python too. It's no big deal. " [135]
- "I've been using R nonstop for pretty much 5+ years. I'm happy that there's established competition coming from Python and new competition coming from Julia. Having these languages compete over similar types of programmers pushes each one to be better, which is awesome. I'm not a die-hard R person, I'd be more than happy to switch under the right circumstances. But...I think one thing gets overlooked way too often. For "data scientists" or "statisticians" or [insert new term here], the majority our non-modeling time is spent on just plain old data wrangling. To me, R is unbeatable here. I've tried Python ~2 years ago and pre-1.0 Julia. Using tidyverse you can do pretty much anything to any dataset, often *without a monstrous amount of keystrokes*. (The pipe syntax is awesome). If you really need speed you can always switch over to data.table for uglier but faster code. I really tried but I could never replicate the "brain cycles to keystrokes" speed of R in Python/Julia. That is, being able to intuitively and quickly just convert my thoughts into readable data wrangling code. Sure the base R language is not that "fast" and Julia/Python benchmarks are way faster. But in practice this doesn't matter to me. Most of the performance sensitive packages are written in C/C++/Fortran anyway (rstan, brms, glmnet, caret). I don't care that I could write 3x faster loops. The extra 5 seconds for that one piece of code doesn't make up for the absence of a good data wrangling ecosystem. My message to the Julia team: You can get a very large portion of the R userbase to switch over if you focus on a Julia version of the tidyverse (especially dplyr). I know that DataFrames?.jl exists but it just doesn't even come close. There's a difference between "you can do this in Julia too" and "here's a clean/intuitive way to do this better without extra baggage". I'm sorry if the above seems harsh. I genuinely appreciate the Julia team's efforts. I can only imagine how hard it is to create a new language. I just wanted to be honest. " [136]
Julia Internals and implementations
Core data structures: todo
Links:
Number representations
Integers
Floating points todo
array representation
variable-length lists: todo
multidimensional arrays: todo
limits on sizes of the above
string representation
Representation of structures with fields
Julia type system
Julia: dynamism and performance reconciled by design section 4.1 describes some aspects of Julia's type system, and references Julia Subtyping: A Rational Reconstruction for more detail.
Julia variants and variant implementations
- Julia WASM backend: https://tshort.github.io/WebAssemblyCompiler.jl/stable/
- """several Julia constructs are not supported, including: Multi-dimensional arrays (waiting on the Memory type PR) Pointers Union types Exception handling Errors Some integer types (Int16, Int128, ...) BLAS and all other C dependencies """
GENESIS
LabView
Mathematica
"symbolic manipulation and solving"
Simulink
"continuous-time dynamic systems"
Modelica
"mechanical, electrical, etc systems"
Verilog-AMS
"analog and mixed-signal electronics"
Esterel
"reactive control systems"
SBOL
"synthetic biology systems"
Church
http://v1.probmods.org/
WebPPL
http://probmods.org/
Relay IR
https://docs.tvm.ai/langref/index.html
"Relay is a functional, differentiable programming language designed to be an expressive intermediate representation for machine learning systems. Relay supports algebraic data types, closures, control flow, and recursion, allowing it to directly represent more complex models than computation graph-based IRs can. Relay also includes a form of dependent typing using type relations in order to handle shape analysis for operators with complex requirements on argument shapes."
part of the TVM project
Weld IR
https://www.weld.rs/
"A Common Runtime for High Performance Data Analytics"
"Weld is a runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a small common intermediate representation, similar to CUDA and OpenCL?...For example, for Spark, NumPy?, and TensorFlow?, porting over a few Weld operators can increase performance by up to 30x even on some simple workloads!"
https://www.weld.rs/docs/latest/weld/ast/index.html https://www.weld.rs/docs/latest/weld/ast/index.html#enums
https://cs.stanford.edu/~matei/papers/2017/cidr_weld.pdf
https://github.com/weld-project/weld/tree/master/python/grizzly/grizzly https://www.weld.rs/grizzly https://pypi.python.org/pypi/pygrizzly/0.0.1 Grizzly is a subset of the Pandas data analytics library integrated with Weld
https://www.weld.rs/weldnumpy WeldNumpy? is a subset of the NumPy? numerical computing framework integrated with Weld
"Split annotations are a system that allow annotating existing code to define how to split, pipeline, and parallelize it. They provide the optimization that we found was most impactful in Weld (keeping chunks of data in the CPU caches between function calls rather than scanning over the entire dataset), but they are significantly easier to integrate than Weld because they reuse existing library code rather than relying on a compiler IR. This also makes them easier to maintain and debug, which in turn improves their robustness. Libraries without full Weld support can fall back to split annotations when Weld is not supported, which will allow us to incrementally add Weld support based on feedback from users while still enabling some new optimizations." -- [137] https://github.com/weld-project/split-annotations https://shoumik.xyz/static/papers/mozart-sosp19final.pdf
TensorFlow MLIR
https://github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md https://www.tensorflow.org/mlir/overview
https://llvm.org/devmtg/2019-04/slides/Tutorial-AminiVasilacheZinenko-MLIR.pdf
https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/Standard.md
Standard Types: "Standard types are a core set of dialect types that are defined in a builtin dialect and thus available to all users of MLIR."
- complex-type
- float-type
- function-type
- index-type
- integer-type
- memref-type
- none-type
- tensor-type
- tuple-type
- vector-type
"Instead of using phi nodes, MLIR uses a functional form of SSA where terminators pass values into block arguments defined by the successor block." [138]. See also [139]
Links:
VAST
https://github.com/trailofbits/vast
" VAST is a new compiler front/middle-end designed for program analysis. It transforms parsed C and C++ code, in the form of Clang ASTs, into a high-level MLIR dialect. The high level dialect is then progressively lowered all the way down to LLVM IR. This progression enables VAST to represent the code as a tower of IRs in multiple MLIR dialects. " -- [140]
SAL
Sal/Svm: An Assembly Language and Virtual Machine for Computing with Non-Enumerated Sets