Revision 37 not available (showing current revision instead)

proj-oot-ootCompilerNotes1

" I use Vim plugin called vim-godef. It is sort of like ctags, but better. It lets you jump to definition within your codebase as well as Go source. So you are just 'gd' away from learning what your function takes and returns. "

--

" >Also, if you want to refactor an interface, you have to go find all of its (undeclared) implementations more or less by hand.

Or use gofmt -r: http://golang.org/cmd/gofmt/ " --

"As a stand-in for real "X implements Y" declarations, you can always add various cheap statements that cause a compile-time interface-satisfaction check if there weren't such statements already--different phrasings I was playing with last night: http://play.golang.org/p/lUZtDdP5ia "

--

" ekmett Says:

September 20, 2013 at 2:19 pm

Michael Moser,

You’d be surprised.

Due to strict control over side-effects GHC is very good at rejiggering code into a form that does few to no allocations for the kinds of things you usually turn into an inner loop.

Supercompilation (http://community.haskell.org/~ndm/downloads/paper-supero_making_haskell_faster-27_sep_2007.pdf) can even “beat C” in some cases.

Stream fusion (http://metagraph.org/papers/stream_fusion.pdf) and the worker-wrapper transformation (http://www.cs.nott.ac.uk/~gmh/wrapper.pdf) are used heavily in the compiler and core libraries to make it so idiomatic haskell doesn’t have to be slow.

The vector library uses stream fusion to generate rather impressive unrolled loops. This is getting better now with Geoffrey Mainland’s work (http://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf). Now idiomatic Haskell code like:

dot v w = mfold’ (+) 0 (mzipWith (*) v w)

is starting to even beat hand-rolled C that uses SSE primitives!

The key to being able to do these rewrites on your code is the freedom from arbitrary side-effects. That lets the compiler move code around in ways that would be terrifying and unverifiable in a C/C++ compiler. "

--

hmm, seems like we really need pure fns by default then? or at least a way for there to be no side effects by default within loops?

--

" waps 1 day ago

link

But there are several things java insists on that are going to cost you in performance in java that are very, very difficult to fix.

1) UTF16 strings. Ever notice how sticking to byte[] arrays (which is a pain in the ass) can double performance in java ? C++ supports everything by default. Latin1, UTF-8, UTF-16, UTF-32, ... with sane defaults, and supports the full set of string operations on all of them. I have a program that caches a lot of string data. The java version is complete, but uses >10G of memory, where the C++ version storing the same data uses <3G.

2) Pointers everywhere. Pointers, pointers and yet more pointers, and more than that still. So datastructures in java will never match their equivalent in C++ in lookup speeds. Plus, in C++ you can do intrusive datastructures (not pretty, but works), which really wipe the floor with Java's structures. If you intend to store objects with lots of subobjects, this will bit you. As this wasn't bad enough java objects feel the need to store metadata, whereas C++ objects pretty much are what you declared them to be (the overhead comes from malloc, not from the language), unless you declared virtual member functions, in which case there's one pointer in there. In Java, it may (Sadly) be worth it to not have one object contain another, but rather copy all fields from the contained object into the parent object. You lose the benefits of typing (esp. since using an interface for this will eliminate your gains), but it does accelerate things by keeping both things together in memory.

3) Startup time. It's much improved in java 6, and again in java 7, but it's nowhere near C++ startup time.

4) Getting in and out of java is expensive. (Whereas in C++, jumping from one application into a .dll or a .so is about as expensive as a virtual method call)

5) Bounds checks. On every single non-primitive memory access at least one bounds check is done. This is insane. "int[5] a; a[3] = 2;" is 2 assembly instructions in C++, almost 20 in java. More importantly, it's one memory access in C++, it's 2 in java (and that's ignoring the fact that java writes type information into the object too, if that were counted, it'd be far worse). Java still hasn't picked up on Coq's tricks (you prove, mathematically, what the bounds of a loop variable are, then you try to prove the array is at least that big. If that succeeds -> no bounds checks).

6) Memory usage, in general. I believe this is mostly a consequence of 1) and 2), but in general java apps use a crapload more memory than their C++ equivalents (normal programs, written by normal programmers)

7) You can't do things like "mmap this file and return me an array of ComplicatedObject?[]" instances.

But yes, in raw number performance, avoiding all the above problems, java does match C++. There actually are (contrived) cases where java will beat C++. Normal C++ that is. In C++ you can write self-modifying code that can do the same optimizations a JIT can do, and can ignore safety (after proving to yourself what you're doing is actually safe, of course).

Of course java has the big advantage of having fewer surprises. But over time I tend to work on programs making this evolution : python/perl/matlab/mathematica -> java -> C++. Each transition will yield at least a factor 2 difference in performance, often more. Surprisingly the "java" phase tends to be the phase where new features are implemented, cause you can't beat Java's refactoring tools.

Pyton/Mathematica have the advantage that you can express many algorithms as an expression chain, which is really, really fast to change. "Get the results from database query X", get out fields x,y, and z, compare with other array this-and-that, sort the result, and get me the grouped counts of field b, and graph me a histogram of the result -> 1 or 2 lines of (not very readable) code. When designing a new program from scratch, you wouldn't believe how much time this saves. IPython notebook FTW !

reply

upvote

PaulHoule? 9 hours ago

link

Hadoop and the latest version of Lucene come with alternative implementations of strings that avoid the UTF16 tax.

Second, I've seen companies fall behind the competition because they had a tangled up C++ codebase with 1.5 hour compiles and code nobody really understand.

The trouble I see with Python, Mathematica and such is that people end up with a bunch of twisty little scripts that all look alike, you get no code reuse, nobody can figure out how to use each other's scripts, etc.

I've been working on making my Java frameworks more fluent because I can write maintainable code in Java and skip the 80% of the work to get the last 20% of the way there with scripts..

reply "

--

" M: You just mentioned pretty-printers; what other programming tools and programming environments do you favor?

K: When I have a choice I still do all my programming in Unix. I use Rob Pike's sam editor, I don't use Emacs. When I can't use sam I use vi for historical reasons, and I am still quite comfortable with ed [laughing]; I know, that's even before you guys where born. And it's partly a question of history: I knew Bill Joy when he was working on vi.

I don't use fancy debuggers, I use print statements and I don't use a debugger for anything more than getting a stack trace when the program dies unexpectedly. When I write code on Windows I use typically the Microsoft development environment: they know where all the files are, and how to get all the include files and the like, and I use them, even though in many respects they don't match the way I want do business. I also use good old-fashioned Unix tools; when I run Windows I typically import something like the mks toolkits and I have standard programs that I'm used to for finding things and comparing them and so on.

" -- http://www.cs.cmu.edu/~mihaib/kernighan-interview/

--

Nock hints (lines 32-33 of the Nock 5k spec http://www.urbit.org/2013/08/22/Chapter-2-nock.html , operation 10 in its two forms): in one form, an argument (which may be a computed value) is given which is marked as a hint and then thrown away, in another form, an argument

--

" Actually, the hardest problem was getting the instrumentation agent to identify suspendable Clojure functions. This is quite easy with Java Quasar code as suspendable methods declare themselves as throwing a special checked exception. The Java compiler then helps ensure that any method calling a suspendable method must itself be declared suspendable. But Clojure doesn’t have checked exceptions. I though of using an annotation, but that didn’t work, and skimming through the Clojure compiler’s code proved that it’s not supported (though this feature could be added to the compiler very easily). In fact, it turns out you can’t mark the class generated by the Clojure compiler for each plain Clojure function in any sensible way that could be then detected by the instrumentation agent. Then I realized it wouldn’t have mattered because Clojure sometimes generates more than one class per function.

I ended up on notifying the instrumentation agent after the function’s class has been defined, and then retransforming the class bytecode in memory. Also, because all Clojure function calls are done via an interface (IFn), there is no easy way to recognize calls to suspendable functions in order to inject stack management code at the call-site. An easy solution was just to assume that any call to a Clojure function from within a suspendable function is a call to a suspendable function (although it adversely affects performance; we might come up with a better solution in future releases). "

---

http://gcc-melt.org/

--

http://nick-black.com/dankwiki/index.php/AVX

http://nick-black.com/dankwiki/index.php/VEX

--

this was really annoying. Also, something like "bundle" should be part of Oot, not an add-on project.

" bshanks@bshanks:~/prog/backup-pinterest$ bundle install Fetching source index for https://rubygems.org/ 99999Using rake (10.0.3) Installing addressable (2.3.2) Using bundler (1.0.17) Using mime-types (1.19) Installing nokogiri (1.5.6) with native extensions /home/bshanks/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/rubygems/installer.rb:533:in `rescue in block in build_extensions': ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError?)

        /home/bshanks/.rvm/rubies/ruby-1.9.2-p290/bin/ruby extconf.rb checking for libxml/parser.h... no

libxml2 is missing. please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies.


* extconf.rb failed * Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more details. You may need configuration options.

"

http://nokogiri.org/tutorials/installing_nokogiri.html

"

Ubuntu / Debian

Ubuntu doesn’t come with the Ruby development packages that are required for building gems with C extensions. Here are the commands to install everything you might need:

  1. ruby developer packages sudo apt-get install ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 sudo apt-get install libreadline-ruby1.8 libruby1.8 libopenssl-ruby
  2. nokogiri requirements sudo apt-get install libxslt-dev libxml2-dev sudo gem install nokogiri

Although, if you’re using Hardy (8.04) or earlier, you’ll need to install slightly different packages:

  1. nokogiri requirements for Hardy (8.04) and earlier sudo apt-get install libxslt1-dev libxml2-dev

As John Barnette once said, “Isn’t package management convenient? :)” "

--

http://prog21.dadgum.com/6.html

" More than just executing code, I'd want to make queries about which registers a routine changes, get a list of all memory addresses read or modified by a function, count up the cycles in any stretch of code. "

(about assembly, but still relevant)

--

"There's a simple rule about tail recursive calls in Erlang: If a parameter is passed to another function, unchanged, in exactly the same position it was passed in, then no virtual machine instructions are generated....Perhaps less intuitively, the same rule applies if the number of parameters increases in the tail call. This idiom is common in functions with accumulators"

"In fact, just about the worst thing you can do to violate the "keep parameters in the same positions" rule is to insert a new parameter before the others, or to randomly shuffle parameters."

"These implementation techniques, used in the Erlang BEAM virtual machine, were part of the Warren Abstract Machine developed for Prolog. "

-- http://prog21.dadgum.com/1.html

--

" The Go compiler does not support incremental or parallel compilation (yet). Changing one file requires recompiling them all, one by one. You could theoretically componentize an app into separate packages. However it appears that packages cannot have circular dependencies, so packages are more like libraries than classes. " -- http://ridiculousfish.com/blog/posts/go_bloviations.html#go_compiletimes

--

scheme apparently used to have a command 'expand-once' which only expanded macros, etc, one step, assisting in debugging things like macros (mentioned by [1])

that site also mentions using 'trace' somewhere.

---

oot should have a short-ish official 'best practices and gotchas' document (and a longer one linked from that one)

---

a lot of Python packages suggest installing them using "pip install PACKAGENAME".

but if you do that without root access, you get a confusing stack trace terminating in a "Permission Denied"

you can do "sudo pip install" but that's not per-user, and you may not have root access anyway.

if you google for https://www.google.com/search?client=ubuntu&channel=fs&q=pip+install+%22Permission+denied%3A%22&ie=utf-8&oe=utf-8

the top hit is a stackoverflow answer that links to a stackoverflow answer that recommends either (a) using a virtualenv or (b) using sudo

imo the default should be per-user installation

there is a way to do this with pip; you use the --user flag: http://stackoverflow.com/questions/7143077/how-can-i-install-packages-in-my-home-folder-with-pip

---

std formatting:

" foo; bar; if (a) {...} if (a) { if (b) { if (c) { .... } } } "

-->

" foo; bar; if (a) {...} if (a) { if (b) { if (c) { .. }}} "

in other words:

---

compiler API must output the dependency graph between source code files, and between modules, and between source code files and modules

"In any case, a compiler cannot just be a dumb source-to-object-file converter: it must know how to emit dependencies of files (e.g., gcc -M). There is no standardized format for this information, except perhaps a Makefile stub."

"A build system needs to know what order to build source files in; however, the canonical source for this information is inside the import/include declarations of the source file"

---

" The dependency problem is further exacerbated when module dependencies can be cyclic. A build system must know how to resolve cycles, either by compiling strongly connected components of modules at a time, or compiling against "interface" files, which permit separate compilation. This was one of the problems which motivated the Rust developers to not expose a one-source-at-a-time compiler. "

---

" Certain language features are actively hostile to build systems; only the compiler has enough smarts to figure out how to manage the build. Good examples include macros (especially macros that can access the filesystem), other forms of compile-time metaprogramming, and compiler plugins. "

---

" Unfortunately, I don't know what the correct API is, but here is a strawman proposal: every compiler and build system writer should have an alternative mode which lets a user ask the query, "How do I make $output file?" This mode returns (1) the dependencies of that file, and (2) a recipe for how to make it. The idea is to place the dependency-finding logic in the compiler (the canonical place to put it), while letting an external tool actually handle building the dependencies. "

---

" computing dependencies for a program can be difficult, and in particular it may be impossible to do in one pass: you may need to compile (or at least type-check) some parts of the module to have enough information to resolve dependencies in later parts of the module. "

---

http://blog.ezyang.com/2015/12/the-convergence-of-compilers-build-systems-and-package-managers/

gives pros and cons to the design decision to combine these three things (apparently Go and Rust ship with build systems?)

---

" IDE operational support (e.g. error reporting, code formatters, autocomplete, refactoring, debugging, etc). It would be very valuable for a new language to standardize some of these facilities as part of its core toolchain. Go has already done so with formatting. Easy integration with anyone’s favorite editor should help a lot with adoption. "

---

" A potential option for your proposed build output format is http://nixos.org/nix/ expressions.

Nix is already kind of a fusion of a package manager and a build system. It’s rather mature, has a decent ecosystem and does a lot of what he is looking for:

– Complete determinism, handling of multiple versions, total dependency graphs – Parallel builds (using that dependency graph) – Distribution

One major benefit of a solution like generating Nix files over compiler integration is that it works for cross-language dependencies. A lot of the time integrated solutions break down is for things like if a C extension of a Ruby gem relies on the presence of imagemagick. Nix has no problem handling that kind of dependency.

Also of course it is a lot less work to generate Nix expressions than it is to write a package manager. There are already scripts like https://github.com/NixOS/cabal2nix which already solve problems of the packaging system they replace. "

---

Jonathan says: December 8, 2015 at 1:40 am

I think that most language designers seriously underestimate how hard it is to do packaging properly. Most language ecosystem packaging systems have some serious shortcomings. My advice for any would-be packaging system inventor is: at least consider how hard the job will be for a distribution packager to take your modules and package them up for their distribution. Perl got this right (possibly by accident); Ruby got it wrong.

---

" Flavio says: December 8, 2015 at 7:57 am

For Haskell specifically, suppose your project uses libraries X and Y, which respectively depend transitively on versions 1.0 and 2.0 from another library Z, then the compiler in theory is able to track down all uses of DataType? X and figure out if it is crossing boundaries from where version 1.0 uses it to where version 2.0 uses it (in most real programs it doesn’t).

That would allow us to cut down a lot of build failures where the transitive use of different library versions would not cause bugs/problems…

GHC would also need to annotate the exported names of functions, datatypes, typeclasses with some versioning or hashing scheme…

... '

---

"

nwmcsween 27 days ago

Language specific tools (compiler in this case) could allow for things such as checking if library foo's api which library bar 'emulates' is sufficient for program c which depends on foo's API. With a lot more effort you could automate all dependency tracking and alternative library compatibility. "

---

" fizixer 27 days ago

There is a problem but you're looking at it the wrong way.

What's in the compiler's machine code generation phase that the build system needs to know about? If nothing, then making a monolithic system is only going to make your life miserable.

Well-designed compilers are already split into (at least) two subsystems: frontend and backend. Frontend takes the program and spits an AST (very roughly speakign, although there is also the semantic-analysis phase, and intermediate-representation-generation phase). Backend is concerned with code generation. What your build systems needs is just the frontend part. Not only your build system, but also an IDE can benefit greatly from a frontend (which as one of the commenter pointed out, results in wasteful duplication of effort when and IDE writer decides to roll his/her own language parser embedded in the tool).

I think AST and semantic-analyzer are going to play an increasing role in a variety of software development activities and it's a folly to keep them hidden inside the compiler like a forbidden fruit.

And that's the exact opposite of monolith. It's more fragmentation, and splitting the compiler into useful and resusable pieces.

(At this point I have a tendency to gravitate towards recommending LLVM. Unfortunately I think it's a needlessly complicated project, not the least because of being written in a needlessly complicated language. But if you're okay with that it might be of assistance to your problems)


ezyang 27 days ago

I think it is uncontroversial that many tools would benefit from access to the frontend. But what is the interface to the frontend? There is tremendous diversity among frontends; it's the key distinguishing feature of most languages. What exactly are you going to expose? How should the build system interact with this?

LLVM is a great example of a modular compiler which is a pain to program against, because it is being constantly being refactored with BC-breaking changes. As silvas has said, "LLVM is loosely coupled from a software architecture standpoint, but very tightly coupled from a development standpoint". <https://lwn.net/Articles/583271/> In contrast, I can generally expect a command mode which dumps out Makefile-formatted dependencies to be stable across versions. Modularity is not useful for external developers without stability!


fizixer 26 days ago

> What exactly are you going to expose? How should the build system interact with this?

The case of AST is relatively clear. You create a library module, or a standalone program, that takes source code and generates an AST. The AST could be in JSON format, s-expression (that would be my choice), heck even XML. It's better if the schema adheres to some standard (I think LLVM AST can be studied for inspiration, or even taken as standard) but even if it's not, that's a problem orthogonal to that of monolithic vs modular. Once you have an source-to-AST converter, it can be used inside a compiler, text-editor, build system, or something else. These are all "clients" of the source-to-AST module.

I'm not too sure about semantic-analysis since I'm studying it myself at the moment. All I can say at the moment (without expert knowledge of formal semantics) that once AST is used in more and more tools, semantic analysis would follow, and hopefully conventions would emerge for treating it in a canonical fashion. Short of that, every "AST client" can roll their own ad-hoc semantic-analyzer built-into the tool itself. Note that it would still be way more modular than a monolithic design.


henrikschroder 27 days ago

> I think AST and semantic-analyzer are going to play an increasing role in a variety of software development activities

It would be fantastic if source control systems would work on the AST instead of the plain text files, so many annoying problems could be solved there.


Ace17 27 days ago

There are simpler and less intrusive ways to solve outside-AST issues (like an automatic reformatting pass before each build or each diff)

What you're suggesting raises a bunch of new (non-trivial) issues:


sitkack 27 days ago

> - What would you do with code comments? Things like "f(/+old_value+/new_value)".

Comments are included in the AST, the AST should be reprojectable into canonical plaint text.

> - How to store code before preprocessing (C and C++) ?

This could get tricky, punt. cdata

> - How to store files mixing several languages (PHP, HTML, Javascript) ?

Same file format, different semantics. PHP is a DSL.

My new language manifesto includes having a mandatory publicly defined AST.


marssaxman 26 days ago

If it is reprojectable into canonical plain text, it's not really an AST - just an ST.


"

---

"

haberman 27 days ago

This analysis misses one major part of the equation: configuring the build. Almost every non-trivial piece of software can be built in multiple configurations. Debug vs. Release, with/without feature X, using/not using library Y.

The configuration of the build can affect almost every aspect of the build. Which tool/compiler is called, whether certain source files are included in the build or not, compiler flags (including what symbols are predefined), linker flags, etc. One tricky part about configuration is that it often needs a powerful (if not Turing-complete) language to fully express. For example, "feature X can only be enabled if feature Y is also enabled." If you use the autotools, you write these predicates in Bourne Shell. Linux started with Bourne Shell, then Eric Raymond tried to replace it with CML2 (http://www.catb.org/~esr/cml2/), until a different alternative called LinuxKernelConf? won out in the end (http://zippel.home.xs4all.nl/lc/).

Another thing missing from the analysis are build-time abstractions over native OS facilities. The most notable example of this is libtool. The fundamental problem libtool solves is: building shared libraries is so far from standardized that it is not reasonable for individual projects that want to be widely portable to attempt to call native OS tools directly. They call libtool, which invokes the OS tools.

In the status quo, the separation between configuration and build system is somewhat delineated: ./configure spits out Makefile. But this interface isn't ideal. "make" has way too much smarts in it for this to be a clean separation. Make allows predicates, complex substitutions, implicit rules, it inherits the environment, etc. If "make" was dead simple and Makefiles were not allowed any logic, then you could feasibly write an interface between "make" and IDEs. The input to make would be the configured build, and it could vend information about specific inputs/outputs over a socket to an IDE. It could also do much more sophisticated change detection, like based on file fingerprints instead of timestamps.

But to do that, you have to decide what format "simple make" consumes, and get build configuration systems to output their configured builds in this format.

I've been toying around with this problem for a while and this is what I came up with for this configuration->builder interface. I specified it as a protobuf schema: https://github.com/haberman/taskforce/blob/master/taskforce....


Tyr42 27 days ago

"simple make" is ninja, right?

https://ninja-build.org/

Already works with cmake and so on.


GFK_of_xmaspast 27 days ago

Doesn't work with fortran, which I found out to my chagrin just this afternoon.


iainmerrick 27 days ago

Why doesn't it work? Ninja should work with any tools that you can run from the command line.


GFK_of_xmaspast 27 days ago

Things like module dependencies, apparently: https://groups.google.com/forum/#!searchin/ninja-build/fortr... "

---

"indentation, highlighting, auto complete, jump to definition"

---

" [–]dmwit 20 points 28 days ago

And then you have tools like TeX?, where through careful misdesign you can't really know what got used until the compilation is completely done once, and moreover can't even know what sequence of commands will build your artifact before you start running them. (Think of tools like rubber and latexmk that exist to inspect the output of the standard compiler to check whether to run a command/what command to run next. Ugh.)

    permalinksavereportgive goldreply

[–]thang1thang2 9 points 27 days ago

Careful misdesign? It was made to run on arbitrarily large files and outputs using hardware so pathetic you wouldn't even buy a toaster with it.

    permalinksaveparentreportgive goldreply"

---

must make it dead simple for someone to package their program into a static executable -- apparently python virtualenv uses the system python version [2], we need to provide more isolation, esp. since we plan to change the language so quickly at first

---

haberman 11 hours ago

> which they could have done with Python

I don't think they could have created a statically-linked Python binary that contained the entire Python installation and core libraries, and the entire application in less than 5 MB. The entire app in a self-contained executable. No futzing with PYTHONPATH, LD_LIBRARY_PATH, virtualenv, installation, conflicting with the system-installed Python, etc. That's what Lua let them do.

But don't take it from me. Here is Guido van Rossum last year:

> The final question was about what he hates in Python. "Anything to do with package distribution", he answered immediately. There are problems with version skew and dependencies that just make for an "endless mess". He dreads it when a colleague comes to him with a "simple Python question". Half the time it is some kind of import path problem and there is no easy solution to offer.

https://lwn.net/Articles/651967/

reply

 mixmastamyk 9 hours ago

PyInstaller?, Py2exe.

reply

monkmartinez 5 hours ago

Those are not good options, seriously lacking in one form or another. Don't take my word for it: https://glyph.twistedmatrix.com/2015/09/software-you-can-use...

reply

https://glyph.twistedmatrix.com/2015/09/software-you-can-use.html

mixmastamyk 5 hours ago

I've used py2exe extensively and was happy with it. I also used it with python 2.x despite the article's assertion.

You can also zip an app (think .jar) since 2.6, with the zippap module available in 3.5.

reply

 mixmastamyk 5 hours ago

That article's dismissal of py2exe is false.

reply

BuckRogers? 1 hour ago

This is really meant for monkmartinez, but you're right.

I'd like to add, that blog can be dismissed outright. It's also out of date in regards to PyInstaller?. It does support Python3 now. 2.7;3.3-3.5. I've used PyInstaller? for years and prefer it over the other packaging solutions, it's really good.

I wouldn't be digging up old blogs as a source for dirt and relying on any information within without verifying it was or still is true.

reply

---

need to be able to 'strip' but save debugging symbols to a separate .dbg file so that eg you can ship a binary for embedded systems but then the developer can debug a stacktrace from that system, as is done here (search for '.dbg'): http://blog.httrack.com/blog/2013/08/23/catching-posix-signals-on-android/

---

this is cool:

https://shaunlebron.github.io/parinfer/

---

J has a cool built-in debugger that allows you to set breakpoints and to step and to go to the interactive REPL upon breaking. It can also be set to continue until there's an error and then go into the interactive REPL (so you can examine variables and stuff).

[3] [4]

---

"haskell-ide-engine was set up with the goal to create a standard backend for Haskell IDEs"

https://github.com/haskell/haskell-ide-engine

---

something like "crystal play": search for 'crystal play' in https://blog.codeship.com/an-introduction-to-crystal-fast-as-c-slick-as-ruby/

---

could allow, on the commandline, some public keys to execute unverified code

---

" While the current situation is good, it’s still not great.

Here are just a few of my desiderata:

    We still need better and more universally agreed-upon tooling for end-user deployments.
    Pip should have a GUI frontend so that users can write Python stuff without learning as much command-line arcana.
    There should be tools that help you write and update a setup.py. Or a setup.python.json or something, so you don’t actually need to write code just to ship some metadata.
    The error messages that you get when you try to build something that needs a C compiler and it doesn’t work should be clearer and more actionable for users who don’t already know what they mean.
    PyPI should automatically build wheels for all platforms by default when you upload sdists; this is a huge project, of course, but it would be super awesome default behavior."

---

" I kept finding myself trying to hack things together to make it easy for other people to just clone the repo and have it work. This is not as easy as it sounds with Go. Other languages are much easier to work with since they don’t force you to have a GOPATH or to have your code structured into particular directories inside that path. And to make matters worse, you have to install another tool like godep to vendor your dependencies and that has it’s own set of issues like forcing you to have your code in git. "

---

it's ok if the compiler runs slow in -O3 mode. But it must be very fast in default mode.

---

"OCaml's compilation command takes in a `-pp` flag (preprocessor), which accepts a command that takes in a file and outputs an OCaml AST tree."

---

ars 1 day ago [-]

I don't know if anyone from the team is reading this, but I'll tell you want I want from a Javascript debugger:

I want to be able to modify/add code, try it out, then rewind, and try something else. Then save my changes once they work perfectly.

Simply setting breakpoints and watching variables is nice, but I can do basically the same thing by echoing/console.log'ing things. If you are going to go to the effort of making a debugger, then make it do something I can't do another way.

reply

---

" Rust should integrate easily into large build systems

When working with larger organizations interested in using Rust, one of the first hurdles we tend to run into is fitting into an existing build system. We've been exploring a number of different approaches, each of which ends up using Cargo (and sometimes rustc) in different ways, with different stories about how to incorporate crates from the broader crates.io ecosystem. Part of the issue seems to be a perceived overlap between functionality in Cargo (and its notion of compilation unit) and in ambient build systems, but we have yet to truly get to the bottom of the issues—and it may be that the problem is one of communication, rather than of some technical gap.

By the end of 2017, this kind of integration should be easy: as a community, we should have a strong understanding of best practices, and potentially build tooling in support of those practices. And of course, we want to approach this goal with Rust's values in mind, ensuring that first-class access to the crates.io ecosystem is a cornerstone of our eventual story. " -- https://github.com/aturon/rfcs/blob/roadmap-2017/text/0000-roadmap-2017.md

---

optforfon 1 day ago [-]

The problem is immature tooling. There is no feedback loop from the compiler's generated assembly back to the IDE. We finally have libclang which sorta does some stuff (I'm not entirely sure how far it can go) - but I'm honestly not seeing any work being done in this direction on the IDE level. After all these years of C++ development, why doesn't the IDE do something as simple as tell me if a function is being inline or not is beyond me (that's the tip of the iceberg in terms of what I want to know).

When I asked people at CppCon? about it I just got some shrugs and was told "just go look at the assembly".

Another solution is profiling - but that's got a slow turn around, and it can be hard to narrow down problem areas.

reply

junke 1 day ago [-]

For Lisp, SBCL is known for giving optimizations notes:

    ; note: forced to do GENERIC-+ (cost 10)
    ;       unable to do inline fixnum arithmetic (cost 2) because:
    ;       The first argument is a NUMBER, not a FIXNUM.
    ;       The second argument is a (INTEGER -19807040623954398379958599680
    ;                                 19807040623954398375663632385), not a FIXNUM.
    ;       The result is a (VALUES NUMBER &OPTIONAL), not a (VALUES FIXNUM &REST T).
    ;       unable to do inline (signed-byte 64) arithmetic (cost 5) because:
    ;       The first argument is a NUMBER, not a (SIGNED-BYTE 64).
    ;       The second argument is a (INTEGER -19807040623954398379958599680
    ;                                 19807040623954398375663632385), not a (SIGNED-BYTE
    ;                                                                        64).
    ;       The result is a (VALUES NUMBER &OPTIONAL), not a (VALUES (SIGNED-BYTE 64)
    ;                                                                &REST T).
    ;       etc.

Note also that the approach is different there: integers are not modular, but you can still perform modular arithmetics if the range of values is adequate (http://www.sbcl.org/manual/#Modular-arithmetic).

reply

pjmlp 1 day ago [-]

Starting with VS 2015, Visual Studio does show execution time for each function call during a debugging session, while stepping.

Also switching between source view and Assembly alongside original source view is a key away (F12) for all Microsoft languages.

reply

---

" Elixir actually comes with a lot of deployment options, but the primary method is via the excellent distillery package. This wraps up your Elixir application into a binary with all of its dependencies that can be deployed to its destination.

The biggest difference between ((this and Golang)) is that compilation for the destination architecture has to be done on that same architecture. "

---

Emacs and Lisp paredit demo:

https://www.youtube.com/watch?v=D6h5dFyyUX0

---

to get reproducable builds:

" Sources of difference

Here's is what we found that we needed to fix, how we chose to fix it and why, and where are we now.

There are many reasons why two separate builds from the same sources can be different. Here's an (incomplete) list:

    timestamps
    Many things like to keep track of timestamps, specially archive formats (tar(1), ar(1)), filesystems etc. The way to handle each is different, but the approach is to make them either produce files with a 0 timestamp (where it does not matter like ar), or with a specific timestamp when using 0 does not make sense (it is not useful to the user).
    dates/times/authors etc. embedded in source files
    Some programs like to report the date/time they were built, the author, the system they were built on etc. This can be done either by programmatically finding and creating source files containing that information during build time, or by using standard macros such as __DATE__, __TIME__ etc. Usually putting a constant time or eliding the information (such as we do with kernels and bootblocks) solves the problem.
    timezone sensitive code
    Certain filesystem formats (iso 9660 etc.) don't store raw timestamps but formatted times; to achieve this they convert from a timestamp to localtime, so they are affected by the timezone.
    directory order/build order
    The build order is not constant especially in the presence of parallel builds; neither is directory scan order. If those are used to create output files, the output files will need to be sorted so they become consistent.
    non-sanitized data stored into files
    Writing data structures into raw files can lead to problems. Running the same program in different operating systems or using ASLR makes those issues more obvious.
    symbolic links/paths
    Having paths embedded into binaries (specially for debugging information) can lead to binary differences. Propagation of the logical path can prove problematic..." -- [5]

---

gumby 4 hours ago [-]

At Cygnus we had a customer from the telecom industry. They had SLAs with their own customers that included terms like "no more than 10 minutes of downtime per decade". They paid a LOT of money to have one, consistent release (no upgrades, only bug fixes); when they reported a big and got a fix they would diff the new binary and required that every change could be traced solely to the patch issued a nothing else.

Satisfying this made GCC a lot better.

reply

---

ymse 20 hours ago [-]

> Now it really suffices if make triggers a target if one of the inputs changed.

This is exactly how Nix and Guix works. If any inputs change, it will force a rebuild of all dependent packages.

reply

---

"One observation from watching Go and Rust gain popularity is that having an online code evaluation tool like https://play.rust-lang.org/ or https://play.golang.org/ can do wonders for adoption. "

---

runtime 'deterministic jail' option that:

---

'check' mode where everything is checked (syntax errors, type errors, etc) but no compile target is produced

---

https://kite.com/#

this looks awesome in itself, it's also worth noting which editors it integrates with:

" Kite has deep editor integration with Atom, Sublime Text 3, and PyCharm? / IntelliJ?.

The Kite Sidebar also supports vim, VS Code, neovim, and emacs. "

---

wahern 2 days ago [-]

C++ only supports capturing a variable with a lifetime at least as long as the closure. In other words, you can't _return_ a closure from a function that captures an automatic variable (i.e. "stack-allocated" variable), or pass the closure to another thread of control that such that it outlives the original function invocation. AFAIK Go will automatically heap allocate any variable that is captured. (Theoretically, it could attempt to prove it won't out-live the enclosing function(s).) Automatic heap allocation requires automated GC.

C++ has lots of syntactic sugar, but at the end of the day, like C and Rust it's a fundamentally pass-by-value language.

Here's a good paper which describes some of the complexities and design decisions for a full and complete closure implementation that doesn't require decoration and compiler hinting:

  http://www.cs.tufts.edu/~nr/cs257/archive/roberto-ierusalimschy/closures-draft.pdf

reply

wahern 2 days ago [-]

More general issue: https://en.wikipedia.org/wiki/Funarg_problem

reply

jdblair 2 days ago [-]

This is important nuance that makes complete sense but that I didn't actually appreciate. Thank you.

reply

---

" How did we get here? Well, C++ and its stupid O(n^2) compilation complexity. As an application grows, the number of header files grows because, as any sane and far-thinking programmer would do, we split the complexity up over multiple header files, factor it into modules, and try to create encapsulation with getter/setters. However, to actually have the C++ compiler do inlining at compile time (LTO be damned), we have to put the definitions of inline functions into header files, which greatly increases their size and processing time. Moreover, because the C++ compiler needs to see full class definitions to, e.g., know the size of an object and its inheritance relationships, we have to put the main meat of every class definition into a header file! Don't even get me started on templates. Oh, and at the end of the day, the linker has to clean up the whole mess, discarding the vast majority of the compiler's output due to so many duplicated functions. And this blowup can be huge. A debug build of V8, which is just a small subsystem of Chrome, will generate about 1.4GB of .o files which link to a 75MB .so file and 1.2MB startup executable--that's a 18x blowup. " -- [6]

---

don't optimize out checks for illegal states:

http://www.kb.cert.org/vuls/id/162289

---

to3m 2 days ago [-]

> As far as I can tell you can’t step through tests in the debugger

Rant time. What to do when you write a test framework.

1. When test fails, print everything possible: desired expression, desired value, actual exression, actual value, line and file of failing call. When printing line and file, make the format configurable, with the default being the prevailing standard for the system, so that people can get clickable links in their text editor or terminal with a minimum of hassle.

About 90% of the time, this will give people everything they need in order to at least get started on fixing the failed test

2. Don't make it hard to use the debugger. The remaining 10% of the time, people will need to step through the code. Some measurable fraction of the other 90%, they'll also end up needing to do this, because it looked like it was something simple but actually it was more than that. So don't make this hard

3. See steps 1 and 2

This might sound obvious, but it clearly isn't, because I've used several test frameworks that make running in the debugger rocket science, and print literally nothing but your own message (if even that) when the test fails. Like, you do 'assert.equals(x,y)', and it doesn't even show you what the values of x and y are, let alone figure out that maybe "x!=y" would be an obvious thing to print.

This may not sound like a big deal with my stupid little example, but after you've written several tens of these you will start to see my point.

reply

---

" Debugging is also a nightmare. There's no way to step through function calls or set break points, even though it's basically just Javascript. Multiple classes of bugs will generate the same "invalid opcode" error that is only visible at run-time with no indication of what line even triggered the error. "

---

hrpnk 67 days ago [-]

In fact the tolling around Solidity and smart contracts is quite rich. It's possible to write unit tests, execute contracts code on a private chain in a similar way to integration/API tests, and measure the code coverage.

Sadly, it's not a common practice to develop contracts with full use of such tooling. See [0] for an analysis of the recent 2017 ICOs and their use of the tooling in crowdsale contracts. A few gems exist, that do quite thorough testing including a CI approach.

https://medium.com/@bocytko/would-you-trust-your-money-to-a-smart-contract-3581063d52b1

---

https://kristerw.blogspot.com/2017/09/useful-gcc-warning-options-not-enabled.html

---

 minexew 22 hours ago [-]

My absolute favorite for modern C++ is

-Werror=switch

Using this, you can enforce that a switch statement over an enum value is complete (covers all cases) simply by omitting the default case.

reply

asveikau 21 hours ago [-]

You're not the first person I've seen suggest that this condition is worth flagging, but I don't completely understand why. People who conform to such a style then go on to add a "default" case that is a no-op, which I guess is more explicit, but also feels a little tedious and verbose. Switch statements in C can do all sorts of funky things (vide: Duff's device) so I think omitting a case label for an enum value feels like a much lesser crime. What if you just don't have meaningful things to do for that enum value?

reply

jlg23 20 hours ago [-]

> I think omitting a case label for an enum value feels like a much lesser crime.

This is not about the coding police coming to get you but preventing you from shooting yourself into your feet when you extend the enumeration. Quite often you want to cover all cases and write code that signals an error if the default branch is called. CL even has a special version of the CASE macro (CL's "switch") called ECASE[1] with the only difference being that ECASE signals an error on unhandled cases.

[1] http://www.lispworks.com/documentation/HyperSpec/Body/m_case...

reply

---

http://www.smashcompany.com/technology/why-would-anyone-choose-docker-over-fat-binaries talks about the importance of statically linked executables including all of their dependencies even glibc.

---

" IntelliJ?. Rust has official support in JetBrains’? IDEs (IntelliJ? IDEA, CLion, WebStorm?, etc.), which includes:

    Finding types, functions and traits across the whole project, its dependencies and the standard library.
    Hierarchical overview of the symbols defined in the current file.
    Search for all implementations of a given trait.
    Go to definition of symbol at cursor.
    Navigation to the parent module.
    Refactoring and code generation

RLS. The RLS is an editor-independent source of intelligence about Rust programs. It is used to power Rust support in many editors including Visual Studio Code, Visual Studio, and Atom, with more in the pipeline. It is on schedule for a 1.0 release in early 2018, but is currently available in preview form for all channels (nightly, beta, and stable). It supports:

    Code completion (using Racer)
    Go to definition (and peek definition if the editor supports it)
    Find all references
    Find impls for a type or trait
    Symbol search (current file and project)
    Reformatting using rustfmt, renaming
    Apply error suggestions (e.g., to add missing imports)
    Docs and types on hover
    Code generation using snippets
    Cargo tasks
    Installation and update of the RLS (via rustup)

"

---

"tool for automatically adding type annotations to your Python 3 code via runtime tracing of types seen."

eg https://instagram-engineering.com/let-your-code-type-hint-itself-introducing-open-source-monkeytype-a855c7284881

---

https://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9/

---

https://wiki.haskell.org/GHC/Typed_holes

---

consider having a 'config' directory for project config files instead of just putting them in project root

https://github.com/nodejs/tooling/issues/79 https://news.ycombinator.com/item?id=24066748

---

https://romefrontend.dev/blog/2020/08/08/introducing-rome.html is by a guy who started Babel and Yarn and wanted one tool to "power minifiers, linters, formatters, syntax highlighters, code completion tools, type checkers, codemod tools, and every other tool" because "A linter in the JavaScript? ecosystem is exactly the same as a compiler" -- https://romefrontend.dev/blog/2020/08/08/introducing-rome.html

so, we should probably look at the structure and/or (internal?) API of Rome for some tips here

some notes: https://romefrontend.dev/#linting

they dont plan on a plugin system anytime soon tho:

https://news.ycombinator.com/item?id=24096282

so i guess just look at their source code to see the data structures they use (AST etc)

---

"TS, ESlint, Prettier, Webpack, Babel" -- purported js best practices [7], Aug 2020

---

the prevailing consensus esp in the JS world is that 1 tool should do 1 thing. This is fine under the Unix philosophy, but challenges arise due to the combinatorial explosion of config needs, bad errors, and overhead from crossing module boundaries. There are a number of attempts at challenging this status quo:

As we consolidate on the jobs to be done we expect out of modern tooling, it makes sense to do all these in a single pass with coherent tooling. It will not make sense for a large swathe of legacy setups, but once these tools are battle tested, they would be my clear choice for greenfield projects.

recommended related reads:

reply

---

deno used to use flatbuffers but switched to json:

" Long standing problem was use of Flatbuffers for serialization of data between Rust and V8 - ie. serialization of “ops”. Although Flatbuffers are fast, they’re also very complicated to use and significantly increase complexity of build process. However Flatbuffers are no more! After reorganization of dispatch mechanism and porting ops code Deno now uses JSON for all serialization except super-hot “read”/“write” ops which use custom minimal binary format. Whole operation was done without significant performance impact! Read more about it in #2796, #2799 and #2801 " -- Deno Newsletter September 3 · Issue #32 · View online https://www.getrevue.co/profile/DenoNewsletter/issues/deno-newsletter-32-197490

they used to use it for 'ops', whatever that means (are these interpreter ops?); see partial list of ops:

https://github.com/denoland/deno/issues/2801?utm_campaign=Deno%20Newsletter&utm_medium=email&utm_source=Revue%20newsletter (and in other issues ref'd by that one eg https://github.com/denoland/deno/pull/2804 )

oh this explains what ops are: https://denolib.gitbook.io/guide/codebase-basics/infrastructure https://denolib.gitbook.io/guide/codebase-basics/more-components https://denolib.gitbook.io/guide/codebase-basics/example-adding-a-deno-api

it's used in the (interpreter?) in message passing from typescript to Rust to 'do all the fancy stuff' / do privileged operations such as file I/O

some of deno's architecture is described in https://en.wikipedia.org/w/index.php?title=Deno_(software)&oldid=972081625

"Using message passing channels for invoking privileged system APIs and using bindings."

" the prototype of Deno, aiming to achieve system call bindings through message passing with serialization tools such as Protocol Buffers, and to provide command line flags for access control.

Deno was initially written in Go and used Protocol Buffers for serialization between privileged (Go, with system call access) and unprivileged (V8) sides.[11] However, Go was soon replaced with Rust due to concerns of double runtime and garbage collection pressure.[12] Tokio is introduced in place of libuv as the asynchronous event-driven platform,[13] and Flatbuffers is adopted for faster, "zero-copy" serialization and deserialization[14] but later in August 2019, FlatBuffers? were finally removed[15] after publishing benchmarks that measured a significant overhead of serialization in April 2019.[16] "

---

askvictor 4 hours ago [–]

I only wish Micropython had a real debugger (pdb).

reply

matt_trentini 3 hours ago [–]

  1. 5026 was merged last year; it provides the sys.settrace primitive which paves the way for debugging support.

https://github.com/micropython/micropython/pull/5026

Still quite a lot of work left for pdb-like features but I'm optimistic it'll get there!

reply

---

"What does an operating system kernel designed for debugging experience look like? What does a memory model or memory allocator designed for debugging experience look like? Are the tools we have today good enough foundations for massive open-source communities like NPM? How can we improve the programming environment and our tools to make the most of package repositories with hundreds of millions of libraries and billions of lines of code?" -- [8]

---

1 point by bshanks 29 days ago

parent [–]on: Emacs 27.1

If you were designing a programming language with IDE tooling in mind, what are the things that Common Lisp and Smalltalk do to better assist IDEs than Python/Ruby?

pjmlp 29 days ago

unvote [–]

Anders explains it better than me,

"Anders Hejlsberg on Modern Compiler Construction"

https://www.youtube.com/watch?v=wSdV1M7n4gQ

Basically ability to see the language AST at all time, do time checking with broken code, incremental compilation, REPL and IDE editing experience is fused together.

Here is another short examples,

"7 minutes of Pharo Smalltalk for Rubyists"

https://www.youtube.com/watch?v=HOuZyOKa91o

"MountainWest? RubyConf? 2014 - But Really, You Should Learn Smalltalk"

https://www.youtube.com/watch?v=eGaKZBr0ga4

---

" 14. --pretty=expanded and cargo expand

In C — especially C that makes heavy use of the preprocessor — the -E option can be invaluable: it stops the compilation after the preprocessing phase and dumps the result to standard output. Rust, as it turns out has an equivalent in the --pretty=expanded unstable compiler option. "

---

Julia's @time macro prints out the time taken by some function, but also the amount of memory allocated within it (gross, i assume)

[9]

---

random post on 'outlining', the opposite of inlining, to reduce binary size: https://eng.uber.com/how-uber-deals-with-large-ios-app-size/

this post talks about more size optimizations: https://news.ycombinator.com/item?id=26278358

---

"I/O for imports. There is an algorithmic problem here if you have M entries on PYTHONPATH, and N imports, then the Python startup algorithm does O(M*N)*constant stat() calls."

---

https://earthly.dev/ https://earthly.dev/blog/dont-use-earthly/ Getting a Repeatable Build, Every Time

---